Difference between INNER JOIN and WHERE? - tsql

First Query:
Select * from table1 inner join table2 on table1.Id = table2.Id
Second Query:
Select * from table1, table2 where table1.Id = table2.Id
What is difference between these query regarding performance which should one use?

The two statements you posted are logically identical. There isn't really a
practical reason to prefer one over the other, it's largely a matter of
personal style and readability. Some people prefer the INNER JOIN syntax and
some prefer just to use WHERE.
Refering to Using Inner Joins:
In the ISO standard, inner joins can
be specified in either the FROM or
WHERE clause. This is the only type of
join that ISO supports in the WHERE
clause. Inner joins specified in the
WHERE clause are known as old-style
inner joins.
Refering to Join Fundamentals:
Specifying the join conditions in the
FROM clause helps separate them from
any other search conditions that may
be specified in a WHERE clause, and is
the recommended method for specifying
joins.
Personaly, I prefer using INNER JOIN. I find it much clearer, as I can separate the join conditions from the filter conditions and using a seperate join block for each joined table.

To amplify #Akram's answer - many people prefer the inner join syntax, since it then allows you to more easily distinguish between the join conditions (how the various tables in the FROM clause relate to each other) from the filter conditions (those conditions that should be used to reduce the overall result set. There's no difference between them in this circumstance, but on larger queries, with more tables, it may improve readability to use the inner join form.
In addition, once you start considering outer joins, you pretty well need to use the infix join syntax (left outer join,right outer join), so many find a form of symmetry in using the same style for inner join. There is an older deprecated syntax for performing outer joins in the WHERE clause (using *=), but support for such joins is dying out.

Related

AWS docs say that merge joins can be used for outer joins, but not full joins. Are those the same things?

Redshift documentation says:
Merge Join
Typically the fastest join, a merge join is used for inner joins and
outer joins. The merge join is not used for full joins.
But I've always read that full joins and outer joins are the same thing: rows from both tables are kept, regardless of if they exist in the other table.
Are they just referring to left outer joins and right outer joins as those that work for merge sort, while "outer join"s (full outer joins) do not?
Good spot, perhaps the docs could be clearer. You can submit a pull request for our docs if you feel motivated to do so.
In this case, the doc means that a merge join is used when one table is the "primary" table, e.g, it will be used for INNER JOIN, LEFT [OUTER] JOIN, RIGHT [OUTER] JOIN but not used for FULL [OUTER] JOIN where rows from both side must be retained.

How to optimize a query with several join?

I have to write on paper a good physical plan for a Postgresql's query with several natural join, is it the same as treating a query with a simple join or should I use a different approach?
I am working on this one, by the way
SELECT zname
FROM Cage natural join Animal natural join DailyFeeds natural join Zookeeper
WHERE shift=’const’ AND clocation=’const’;
By Oracle
A NATURAL JOIN is a JOIN operation that creates an implicit join
clause for you based on the common columns in the two tables being
joined. Common columns are columns that have the same name in both
tables.
I think the above is answering following
is it the same as treating a query with a simple join or should I use a different approach?
I hope it helps.

Can I apply predicates to the same columns against multiple tables in a JOIN only once?

I want to join two tables together and add additional information from two other tables to the same columns in both queried tables. I've come up with the below code, which works, but I don't feel comfortable about having to add another JOIN clause for each table, as it would make the query substantially long if I wanted to join/add more things.
Is there a way to combine it, so that I can join additional tables only once (just use S and E aliases every time)?
SELECT
J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
SF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
EF.EngineerName AS FollowUpEngineerName
FROM
Jobs J
INNER JOIN
Jobs JF
ON
J.FollowUpJobId = JF.JobId
INNER JOIN
StandardJobs S
ON
J.StandardJobId = S.StandardJobId
INNER JOIN
Engineers E
ON
E.EngineerId = J.EngineerId
INNER JOIN
StandardJobs SF
ON
SF.StandardJobId = JF.StandardJobId
INNER JOIN
Engineers EF
ON
EF.EngineerId = JF.EngineerId
One approach would be to use a Common Table Expression (CTE) - something like:
with cte as
(SELECT J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
J.FollowUpJobId
FROM Jobs J
INNER JOIN StandardJobs S ON J.StandardJobId = S.StandardJobId
INNER JOIN Engineers E ON E.EngineerId = J.EngineerId)
SELECT O.*,
F.StandardJobId AS FollowUpStandardJobId,
F.JobName AS FollowUpJobName,
F.EngineerId AS FollowUpEngineerId,
F.EngineerName AS FollowUpEngineerName
FROM CTE AS O
JOIN CTE AS F ON O.FollowUpJobId = F.JobId
You can sort of do this with either a CTE (Common Table Expressions, the WITH clause) or a View:
;WITH Jobs_Extended As
(
SELECT j.*,
s.JobName,
E.EngineerName
FROM Jobs As j
JOIN StandardJobs As s ON s.StandardJobId = j.StandardJobId
JOIN Engineer As e ON e.EngineerId = j.EngineerId
)
SELECT
J.JobId,
J.StandardJobId,
J.JobName,
J.EngineerId,
J.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
JF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
JF.EngineerName AS FollowUpEngineerName
FROM Jobs_Extended J
JOIN Jobs_Extended JF ON J.FollowUpJobId = JF.JobId
In this example the CTE Jobs_Extended becomes a defined alias for the relationship between the Jobs, Engineers and StandardJobs tables. Then once defined, you can use it multiple times in the query without having to redefine those interior relations.
You can do the same thing by change the WITH to a View, which will make the defined alias permannet in your database.
No, you cannot avoid JOINing related tables each time a separate reference is needed. The issue is that you are not working with the tables in a general sense but instead working with the specific rows of each table, even more specifically, just those rows that match the JOIN and WHERE conditions.
There is no way to specify the references to either StandardJobs or Engineers only once because you are needing to work with two rows from each table at the same time, at least in the given example.
However, depending on which direction you are wanting to go with "additional tables" (more references to Jobs or more lookups like StandardJobs and Engineers for the given 2 references of Jobs), the CTE construct shown by Mark is the probably the easiest / best way to abstract it. I posted this answer mainly to explain the issue at hand.

t-sql condition placement

Should SQL Server yield the same results for both of the queries below? The primary difference is the condition being placed in the WHERE clause with the former, and with the latter being placed as a condition upon the join itself.
SELECT *
FROM cars c
INNER JOIN parts p
ON c.CarID = p.CarID
WHERE p.Desc LIKE '%muffler%'
SELECT *
FROM cars c
INNER JOIN parts p
ON c.CarID = p.CarID
AND p.Desc LIKE '%muffler%'
Thanks in advance for any help that I receive upon this!
For INNER JOINS it will make no difference to semantics or performance. Both will give the same plan. For OUTER JOINs it does make a difference though.
/*Will return all rows from cars*/
SELECT c.*
FROM cars c
LEFT JOIN parts p
ON c.CarID = p.CarID AND c.CarID <> c.CarID
/*Will return no rows*/
SELECT c.*
FROM cars c
LEFT JOIN parts p
ON c.CarID = p.CarID
WHERE c.CarID <> c.CarID
For inner joins the only issue is clarity. The JOIN condition should (IMO) only contain predicates concerned with how the two tables in the JOIN are related. Other unrelated filters should go in the WHERE clause.
For inner joins the two queries should yield exactly the same results. Are you seeing a difference?
Yes, they both get the same results. The difference is when the condition is checked, if during the join or afterwards.
The execution plan will be identical in your example. Next to the parse button should be the "Show execution plan" button. It will give you a clearer picture.
I think in a more complex query with many joins it can be an issue in efficiency, as stated above, before or after.
EDIT: sorry assuming your using sql server management studio.
My recommendation for this kind of situation would be:
put the JOIN condition (what establishes the "link" between the two tables) - and only that JOIN condition - after the JOIN operator
any additional conditions for one of the two joined tables belongs in the regular WHERE clause
So based on that, I would always recommend to write your query this way:
SELECT
(list of columns)
FROM
dbo.cars c
INNER JOIN
dbo.parts p ON c.CarID = p.CarID
WHERE
p.Desc LIKE 'muffler%'
It seem "cleaner" and more expressive that way - don't "hide" additional conditions behind a JOIN clause if they don't really belong there (e.g. help establish the link between the two tables being joined).

T-SQL Left Join Symbol

What is the symbol (like *=) for doing a left join?
I've got table A and B, must always return all records from table A even if there is no records in table B.
This is the new ansi standard syntax, much clearer imo.
SELECT *
FROM A
LEFT OUTER JOIN B
ON A.ID = B.ID
You shouldn't be using that operator, as it was deprecated in Sql Server 2008, and will be removed in future versions.
You should use ANSI compliant LEFT JOIN or LEFT OUTER JOIN instead.
It was deprecated for a reason. That operator's syntax is confusing (it conflicts with many language's standard overload of "multiply and assign") and is non-standard.
Besides of that, use of ANSI standard syntax LEFT [OUTER] JOIN for joins simplifies looking for errors a lot, believe me.
It allows also for clearer distinction between filters in WHERE clause and join operands.
I would recommend also use ANSI syntax for inner joins.
Avoid any ancient syntax like that. Rewrite it to include the newer "LEFT OUTER JOIN" and/or "RIGHT OUTER JOIN" syntax:
SELECT
a.*, B.*
FROM TableA a
LEFT OUTER JOIN TableB b ON a.id=b.id
Difference between * = and LEFT Outer Join
from the link:
In earlier versions of Microsoft® SQL
ServerT 2000, left and right outer
join conditions were specified in the
WHERE clause using the *= and =*
operators. In some cases, this syntax
results in an ambiguous query that can
be interpreted in more than one way.
SQL-92 compliant outer joins are
specified in the FROM clause and do
not result in this ambiguity. Because
the SQL-92 syntax is more precise,
detailed information about using the
old Transact-SQL outer join syntax in
the WHERE clause is not included with
this release. The syntax may not be
supported in a future version of SQL
Server. Any statements using the
Transact-SQL outer joins should be
changed to use the SQL-92 syntax