Merge join in PostgreSQL performs sort on indexed column

Merge join in PostgreSQL performs sort on indexed column - postgresql

I am trying to optimize the following query in postgresql
SELECT ci.country_id, ci.ci_id,ci.name
FROM customer c
INNER JOIN address a ON c.a_id = a.a_id
INNER JOIN city ci ON ci.ci_id = a.ci_id
The columns customer.a_id, address.a_id, city.ci_id and adress.ci_id all have an btree index.
I wanted to use a merge join instead of a hash join as I read that a hash join not really uses indexes so I turned of the hash joins with Set enable_hashjoin=off.
My query is now according to the query plan using a merge join but it is performing always a quick sort before the merge join. I know that for merge join the columns need to be sorted but they should already be sorted through the index. Is there a way to force Postgres to use the index and not to perform the sort?

You are joining three tables. It is using two merge joins to do that, with the output of one merge join being one input of the other. The intermediate table is joined using two different columns, but it can't be ordered on two different columns simultaneously, so if you are only going to use merge joins, you need at least one sort.
This whole thing seems pointless, as the query is already very fast, and why do you care if it uses a hash join or not?

Related

Nested Join vs Merge Join vs Hash Join in PostgreSQL

I know how the
Nested Join
Merge Join
Hash Join
works and its functionality.
I wanted to know in which situation these joins are used in Postgres

The following are a few rules of thumb:
Nested loop joins are preferred if one of the sides of the join has few rows. Nested loop joins are also used as the only option if the join condition does not use the equality operator.
Hash Joins are preferred if the join condition uses an equality operator and both sides of the join are large and the hash fits into work_mem.
Merge Joins are preferred if the join condition uses an equality operator and both sides of the join are large, but can be sorted on the join condition efficiently (for example, if there is an index on the expressions used in the join column).
A typical OLTP query that chooses only one row from one table and the associated rows from another table will always use a nested loop join as the only efficient method.
Queries that join tables with many rows (which cannot be filtered out before the join) would be very inefficient with a nested loop join and will always use a hash or merge join if the join condition allows it.
The optimizer considers each of these join strategies and uses the one that promises the lowest costs. The most important factor on which this decision is based is the estimated row count from both sides of the join. Consequently, wrong optimizer choices are usually caused by misestimates in the row counts.

How to optimize a query with several join?

I have to write on paper a good physical plan for a Postgresql's query with several natural join, is it the same as treating a query with a simple join or should I use a different approach?
I am working on this one, by the way
SELECT zname
FROM Cage natural join Animal natural join DailyFeeds natural join Zookeeper
WHERE shift=’const’ AND clocation=’const’;

By Oracle
A NATURAL JOIN is a JOIN operation that creates an implicit join
clause for you based on the common columns in the two tables being
joined. Common columns are columns that have the same name in both
tables.
I think the above is answering following
is it the same as treating a query with a simple join or should I use a different approach?
I hope it helps.

How to do indexing of two join tables in Postgres?

I have two very big tables for example tab1 and tab2 in Postgres. I need to perform join and group by operation on these tables, to make my query fast i need indexing. Is there any way how to use index over join and group by queries ?? (as I heard indexing is not possible over join and group by)

Can I apply predicates to the same columns against multiple tables in a JOIN only once?

I want to join two tables together and add additional information from two other tables to the same columns in both queried tables. I've come up with the below code, which works, but I don't feel comfortable about having to add another JOIN clause for each table, as it would make the query substantially long if I wanted to join/add more things.
Is there a way to combine it, so that I can join additional tables only once (just use S and E aliases every time)?
SELECT
J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
SF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
EF.EngineerName AS FollowUpEngineerName
FROM
Jobs J
INNER JOIN
Jobs JF
ON
J.FollowUpJobId = JF.JobId
INNER JOIN
StandardJobs S
ON
J.StandardJobId = S.StandardJobId
INNER JOIN
Engineers E
ON
E.EngineerId = J.EngineerId
INNER JOIN
StandardJobs SF
ON
SF.StandardJobId = JF.StandardJobId
INNER JOIN
Engineers EF
ON
EF.EngineerId = JF.EngineerId

One approach would be to use a Common Table Expression (CTE) - something like:
with cte as
(SELECT J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
J.FollowUpJobId
FROM Jobs J
INNER JOIN StandardJobs S ON J.StandardJobId = S.StandardJobId
INNER JOIN Engineers E ON E.EngineerId = J.EngineerId)
SELECT O.*,
F.StandardJobId AS FollowUpStandardJobId,
F.JobName AS FollowUpJobName,
F.EngineerId AS FollowUpEngineerId,
F.EngineerName AS FollowUpEngineerName
FROM CTE AS O
JOIN CTE AS F ON O.FollowUpJobId = F.JobId

You can sort of do this with either a CTE (Common Table Expressions, the WITH clause) or a View:
;WITH Jobs_Extended As
(
SELECT j.*,
s.JobName,
E.EngineerName
FROM Jobs As j
JOIN StandardJobs As s ON s.StandardJobId = j.StandardJobId
JOIN Engineer As e ON e.EngineerId = j.EngineerId
)
SELECT
J.JobId,
J.StandardJobId,
J.JobName,
J.EngineerId,
J.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
JF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
JF.EngineerName AS FollowUpEngineerName
FROM Jobs_Extended J
JOIN Jobs_Extended JF ON J.FollowUpJobId = JF.JobId
In this example the CTE Jobs_Extended becomes a defined alias for the relationship between the Jobs, Engineers and StandardJobs tables. Then once defined, you can use it multiple times in the query without having to redefine those interior relations.
You can do the same thing by change the WITH to a View, which will make the defined alias permannet in your database.

No, you cannot avoid JOINing related tables each time a separate reference is needed. The issue is that you are not working with the tables in a general sense but instead working with the specific rows of each table, even more specifically, just those rows that match the JOIN and WHERE conditions.
There is no way to specify the references to either StandardJobs or Engineers only once because you are needing to work with two rows from each table at the same time, at least in the given example.
However, depending on which direction you are wanting to go with "additional tables" (more references to Jobs or more lookups like StandardJobs and Engineers for the given 2 references of Jobs), the CTE construct shown by Mark is the probably the easiest / best way to abstract it. I posted this answer mainly to explain the issue at hand.

sql query to retrieve DISTINCT rows on left join

I am developing a t-sql query to return left join of two tables, but when I just select records from Table A, it gives me only 2 records. The problem though is when I left join it Table B, it gives me 4 records. How can I reduce this to just 2 records?
One problem though is that I am only aware of one PK/FK to link these two tables.

The field you are using for the join must exist more than once in table B - this is why multiple rows are being returned in the join. In order to reduce the row count you will have to either add further fields to the join, or add a where clause to filter out rows not required.
Alternatively you could use a GROUP BY statement to group the rows up, but this may not be what you need.

Remember that the left join brings you null fields from joined table.
Also you can use select(distinct), but i can't see well you issue. Can you give us more details?