How to optimize a query with several join? - postgresql

I have to write on paper a good physical plan for a Postgresql's query with several natural join, is it the same as treating a query with a simple join or should I use a different approach?
I am working on this one, by the way
SELECT zname
FROM Cage natural join Animal natural join DailyFeeds natural join Zookeeper
WHERE shift=’const’ AND clocation=’const’;

By Oracle
A NATURAL JOIN is a JOIN operation that creates an implicit join
clause for you based on the common columns in the two tables being
joined. Common columns are columns that have the same name in both
tables.
I think the above is answering following
is it the same as treating a query with a simple join or should I use a different approach?
I hope it helps.

Related

Merge join in PostgreSQL performs sort on indexed column

I am trying to optimize the following query in postgresql
SELECT ci.country_id, ci.ci_id,ci.name
FROM customer c
INNER JOIN address a ON c.a_id = a.a_id
INNER JOIN city ci ON ci.ci_id = a.ci_id
The columns customer.a_id, address.a_id, city.ci_id and adress.ci_id all have an btree index.
I wanted to use a merge join instead of a hash join as I read that a hash join not really uses indexes so I turned of the hash joins with Set enable_hashjoin=off.
My query is now according to the query plan using a merge join but it is performing always a quick sort before the merge join. I know that for merge join the columns need to be sorted but they should already be sorted through the index. Is there a way to force Postgres to use the index and not to perform the sort?
You are joining three tables. It is using two merge joins to do that, with the output of one merge join being one input of the other. The intermediate table is joined using two different columns, but it can't be ordered on two different columns simultaneously, so if you are only going to use merge joins, you need at least one sort.
This whole thing seems pointless, as the query is already very fast, and why do you care if it uses a hash join or not?

Tableau joining multiple tables

Hi, I have trouble understanding the structure of this table connection.
Let's suppose that all joins are inner join. Does this picture mean:
Orders JOIN (Orders1 JOIN People) JOIN Returns?
or
Orders JOIN Orders1 JOIN (People JOIN Returns)?
I don't understand
Why Orders1 and People are both vertically aligned and both connected with Orders table.
(As my understanding, join operation is bilateral, not trilateral. My imagination is that the joining should be all represented horizontally, looking like a chain.)
I know SQL, it would be easier to explain if write a pseudo SQL script.
The example you have taken is not very good to understand the joins/relationships in tableau.
A wire/line between two tables indicate the join on two tables with some id column (one to many OR one to one OR many to many). You can check the relationship based on which field (read column) by clicking that relationship thread(line). Why I termed this example not a good one because ORDERS table joining itself may have umpteen options. You can edit that relationship in many ways.
So, in you example, ORDERS is joined with itself (ORDERS1). (the result will depend on relationship type of course). Simultaneously ORDERS is joined with PEOPLE table. Since these tables have only one field in common, this relationship has resulted in creation of just one extra column in ORDERS result. NoW PEOPLE is also connected with RETURNS where no column is common so I am not able to understand this relationship.
A watch of this 5 minute video is recommended.
Translating this relationship will be something like..
(ORDERS JOIN (PEOPLE JOIN RETURNS)) JOIN ORDERS1
(Last JOIN outside braces is on the result of braces but with field/fields from ORDERS)

Nested Join vs Merge Join vs Hash Join in PostgreSQL

I know how the
Nested Join
Merge Join
Hash Join
works and its functionality.
I wanted to know in which situation these joins are used in Postgres
The following are a few rules of thumb:
Nested loop joins are preferred if one of the sides of the join has few rows. Nested loop joins are also used as the only option if the join condition does not use the equality operator.
Hash Joins are preferred if the join condition uses an equality operator and both sides of the join are large and the hash fits into work_mem.
Merge Joins are preferred if the join condition uses an equality operator and both sides of the join are large, but can be sorted on the join condition efficiently (for example, if there is an index on the expressions used in the join column).
A typical OLTP query that chooses only one row from one table and the associated rows from another table will always use a nested loop join as the only efficient method.
Queries that join tables with many rows (which cannot be filtered out before the join) would be very inefficient with a nested loop join and will always use a hash or merge join if the join condition allows it.
The optimizer considers each of these join strategies and uses the one that promises the lowest costs. The most important factor on which this decision is based is the estimated row count from both sides of the join. Consequently, wrong optimizer choices are usually caused by misestimates in the row counts.

T-SQL different JOIN approaches, same results, which one would you prefer?

these are 3 approaches how to make a join. I would like to hear some word on perforance of these 3 queries.
Thank you
SELECT * FROM
tableA A LEFT JOIN tableB B
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
ON B.ColumnB = A.ColumnB
WHERE ColumnX = 'XY'
Versus
SELECT * FROM
tableA A LEFT JOIN tableB B
ON B.ColumnB = A.ColumnB
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
WHERE ColumnX = 'XY'
Versus Common Table Expression
WITH T...
It does not matter.
SQL Server has a cost-based optimizer (as opposed to a rule-based optimizer). That means that the engine is able to figure out that both of your first two options are identical. Run your estimated and actual execution plans and you will see that this is the case.
The only reason you would choose one option over the other is for readability's sake. I go with your second option, because it's a lot easier to read when there are a great many joins involved. ON clauses in reverse order become quite difficult to track.
In my experience, any of the above could be quicker depending on your tables.
As you're setting up joins, you want to start with the most restrictive as possible (without negatively affecting your end result, obviously). This same logic also applies to the Where clause for the same reason. By starting with the most restrictive, you're limiting the number of rows that are being joined and thus evaluated by the Where clause and then returned/manipulated in the select clause. For my answers below regarding the three specific scenarios, I'm assuming a sufficiently complicated query that is doing more than just looking to combine data from multiple tables (i.e., queries answering specific questions).
If Table A is huge and Tables B & C are smaller and more directly related to the data you're trying to isolate, then the first option would likely be fastest.
If Table B or C are huge and Table A is more related to your desired data, the second option would likely be fastest.
As far as option 3 goes, I love CTEs but I try to only use them when I need to do so. Using a CTE will speed up your overall query if the data joined, manipulated, and returned by the CTE is only related to the rest of the query in a limited fashion. Including tables that are only partially related to your end result in your primary string of joins is going to needlessly slow down your query. If you can parse out that data into a CTE, it can run quickly by itself and then be incorporated back into the main query at the end.

Difference between INNER JOIN and WHERE?

First Query:
Select * from table1 inner join table2 on table1.Id = table2.Id
Second Query:
Select * from table1, table2 where table1.Id = table2.Id
What is difference between these query regarding performance which should one use?
The two statements you posted are logically identical. There isn't really a
practical reason to prefer one over the other, it's largely a matter of
personal style and readability. Some people prefer the INNER JOIN syntax and
some prefer just to use WHERE.
Refering to Using Inner Joins:
In the ISO standard, inner joins can
be specified in either the FROM or
WHERE clause. This is the only type of
join that ISO supports in the WHERE
clause. Inner joins specified in the
WHERE clause are known as old-style
inner joins.
Refering to Join Fundamentals:
Specifying the join conditions in the
FROM clause helps separate them from
any other search conditions that may
be specified in a WHERE clause, and is
the recommended method for specifying
joins.
Personaly, I prefer using INNER JOIN. I find it much clearer, as I can separate the join conditions from the filter conditions and using a seperate join block for each joined table.
To amplify #Akram's answer - many people prefer the inner join syntax, since it then allows you to more easily distinguish between the join conditions (how the various tables in the FROM clause relate to each other) from the filter conditions (those conditions that should be used to reduce the overall result set. There's no difference between them in this circumstance, but on larger queries, with more tables, it may improve readability to use the inner join form.
In addition, once you start considering outer joins, you pretty well need to use the infix join syntax (left outer join,right outer join), so many find a form of symmetry in using the same style for inner join. There is an older deprecated syntax for performing outer joins in the WHERE clause (using *=), but support for such joins is dying out.