How does the optimizer decide between merge join and hash join? - postgresql

Database System Concepts introduce several ways to implement a join operation. Two of them are merge join and hash join.
I was wondering when the optimizer decides to use a merge join and
when a hash join?
In particular, from https://stackoverflow.com/a/1114288/156458
hash joins can only be used for equi-joins, but merge joins are more flexible.
But Database System Concepts says both are used only for equi joins and
natural joins.
The merge-join algorithm (also called the sort-merge-join algorithm)
can be used to compute natural joins and equi-joins.
...
Like the merge-join algorithm, the hash-join algorithm can be used to
implement natural joins and equi-joins.
Thanks.
My question comes from PostgreSQL document, where there are two examples, and I am not sure why one uses merge join, and the other hash join:
EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
QUERY PLAN
------------------------------------------------------------------------------------------
Hash Join (cost=230.47..713.98 rows=101 width=488)
Hash Cond: (t2.unique2 = t1.unique2)
-> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244)
-> Hash (cost=229.20..229.20 rows=101 width=244)
-> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244)
Recheck Cond: (unique1 < 100)
-> Bitmap Index Scan on tenk1_unique1
(cost=0.00..5.04 rows=101 width=0)
Index Cond: (unique1 < 100)
and
EXPLAIN SELECT *
FROM tenk1 t1, onek t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
QUERY PLAN
------------------------------------------------------------------------------------------
Merge Join (cost=198.11..268.19 rows=10 width=488)
Merge Cond: (t1.unique2 = t2.unique2)
-> Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244)
Filter: (unique1 < 100)
-> Sort (cost=197.83..200.33 rows=1000 width=244)
Sort Key: t2.unique2
-> Seq Scan on onek t2 (cost=0.00..148.00 rows=1000 width=244)

Related

Postgresql 12: performance issue with overlap operator and join on very same table

I'm having trouble with a "quite simple" request performance:
DB schema:
CREATE TABLE bigdata3.data_1_2021
(
p_value float8 NOT NULL,
p_timestamp tsrange NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_data_1_2021_ts ON bigdata3.data_1_2021 USING gist (p_timestamp);
CREATE INDEX IF NOT EXISTS idx_data_1_2021_ts2 ON bigdata3.data_1_2021 USING btree (p_timestamp);
FYI I'm using btree_gist extention
CREATE EXTENSION IF NOT EXISTS btree_gist;
Also, there are 19037 rows in my table. So now, the request:
WITH data_1 AS
(
SELECT t1.p_value AS value,
t1.p_timestamp AS TS
FROM "bigdata3".data_1_2021 AS t1
WHERE TSRANGE( '2021-02-01 00:00:00.000'::TIMESTAMP,'2021-02-17 09:51:54.000'::TIMESTAMP) && t1.p_timestamp
)
SELECT t1.ts AS ts,
t2.ts AS ts,
t1.value,
t2.value
FROM data_1 as t1
INNER JOIN data_1 as t2 ON t1.ts && t2.ts
This request takes 1 minute.
When I run an explain, many things seems strange to me:
QUERY PLAN
Nested Loop (cost=508.96..8108195.71 rows=1801582 width=80)
Join Filter: (t1.ts && t2.ts)
CTE data_1
-> Seq Scan on data_1_2021 t1_1 (cost=0.00..508.96 rows=18982 width=29)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
-> CTE Scan on data_1 t1 (cost=0.00..379.64 rows=18982 width=40)
-> CTE Scan on data_1 t2 (cost=0.00..379.64 rows=18982 width=40)
1) I expect the sequence scan on the ts range to use the "idx_data_1_2021_ts" index
2) I expect the join to use the very same index for a hash or merge join
The stranger thing comes now:
WITH data_1 AS
(
SELECT t1.p_value AS value,
t1.p_timestamp AS TS
FROM "bigdata3".data_1_2021 AS t1
WHERE TSRANGE( '2021-02-01 00:00:00.000'::TIMESTAMP,'2021-02-17 09:51:54.000'::TIMESTAMP) && t1.p_timestamp
),
data_2 AS
(
SELECT t1.p_value AS value,
t1.p_timestamp AS TS
FROM "bigdata3".data_1_2021 AS t1
WHERE TSRANGE( '2021-02-01 00:00:00.000'::TIMESTAMP,'2021-02-17 09:51:54.000'::TIMESTAMP) && t1.p_timestamp
)
SELECT t1.ts AS ts,
t2.ts AS ts,
t1.value,
t2.value
FROM data_1 as t1
INNER JOIN data_2 as t2 ON t1.ts && t2.ts
I only duplicate my data_1 as a data_2 and change my join to join data_1 with data_2:
Nested Loop (cost=0.28..116154.41 rows=1801582 width=58)
-> Seq Scan on data_1_2021 t1 (cost=0.00..508.96 rows=18982 width=29)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
-> Index Scan using idx_data_1_2021_ts on data_1_2021 t1_1 (cost=0.28..4.19 rows=190 width=29)
Index Cond: ((p_timestamp && t1.p_timestamp) AND (p_timestamp && '["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange))
The request take 1 second and now uses the index!
But ... it's still not perfect because of the seq scan and the nested loop.
Another piece of info: switching to = operator on the join makes the first case faster, but the second case slower ...
Does anybody have an explanation for why it is not properly using the index when joining the very same table? Also I take any advice to make this request going faster.
Many thanks,
Clément
PS: I know this request can look stupid, I made my real case simple to point out my issue.
Edit 1: As requested, the analyze+buffer explain of the first request:
QUERY PLAN
Nested Loop (cost=509.04..8122335.52 rows=1802721 width=40) (actual time=0.025..216996.205 rows=19680 loops=1)
Join Filter: (t1.ts && t2.ts)
Rows Removed by Join Filter: 359841220
Buffers: shared hit=271
CTE data_1
-> Seq Scan on data_1_2021 t1_1 (cost=0.00..509.04 rows=18988 width=29) (actual time=0.013..38.263 rows=18970 loops=1)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
Rows Removed by Filter: 73
Buffers: shared hit=271
-> CTE Scan on data_1 t1 (cost=0.00..379.76 rows=18988 width=40) (actual time=0.016..8.083 rows=18970 loops=1)
Buffers: shared hit=1
-> CTE Scan on data_1 t2 (cost=0.00..379.76 rows=18988 width=40) (actual time=0.000..4.723 rows=18970 loops=18970)
Buffers: shared hit=270
Planning Time: 0.176 ms
Execution Time: 217208.300 ms
AND the second:
QUERY PLAN
Nested Loop (cost=0.28..116190.34 rows=1802721 width=58) (actual time=280.133..817.611 rows=19680 loops=1)
Buffers: shared hit=76361
-> Seq Scan on data_1_2021 t1 (cost=0.00..509.04 rows=18988 width=29) (actual time=0.030..7.909 rows=18970 loops=1)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
Rows Removed by Filter: 73
Buffers: shared hit=271
-> Index Scan using idx_data_1_2021_ts on data_1_2021 t1_1 (cost=0.28..4.19 rows=190 width=29) (actual time=0.041..0.042 rows=1 loops=18970)
Index Cond: ((p_timestamp && t1.p_timestamp) AND (p_timestamp && '["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange))
Buffers: shared hit=76090
Planning Time: 709.820 ms
Execution Time: 981.659 ms
There are too many questions here, I'll answer the first two:
The index is not used, because the query fetches almost all the rows from the table anyway.
Hash or merge joins can only be used with join conditions that use the = operator. This is quite obvious: a hash can only be probed for equality, and a merge join requires sorting and total order.
Because your CTE is referenced twice in the query, the planner automatically materializes it. Once materialized, it can't use the index on the underlying table anymore. (That is, it can't use for the highly selective condition t1.ts && t2.ts. It could still use for the "first half of February" condition as that occurs prior to the materialization, but since it is so non-selective, it chooses not to use it)
You can force it not to materialize it:
WITH data_1 AS NOT MATERIALIZED (...
In my hands, doing this produces the same execution plan as writing two separate CTEs, each of which is referenced only once.

Very slow query on GROUP BY

I am having a really slow query (~100mins). I have omitted a lot of the inner child nodes by denoting it with a suffix ...
HashAggregate (cost=6449635645.84..6449635742.59 rows=1290 width=112) (actual time=5853093.882..5853095.159 rows=785 loops=1)
Group Key: p.processid
-> Nested Loop (cost=10851145.36..6449523319.09 rows=832050 width=112) (actual time=166573.289..5853043.076 rows=3904 loops=1)
Join Filter: (SubPlan 2)
Rows Removed by Join Filter: 617040
-> Merge Left Join (cost=5425572.68..5439530.95 rows=1290 width=799) (actual time=80092.782..80114.828 rows=788 loops=1) ...
-> Materialize (cost=5425572.68..5439550.30 rows=1290 width=112) (actual time=109.689..109.934 rows=788 loops=788) ...
SubPlan 2
-> Limit (cost=3869.12..3869.13 rows=5 width=8) (actual time=9.155..9.156 rows=5 loops=620944) ...
Planning time: 1796.764 ms
Execution time: 5853316.418 ms
(2836 rows)
The above query plan is a query executed to the view, schema below (simplified)
create or replace view foo_bar_view(processid, column_1, run_count) as
SELECT
q.processid,
q.column_1,
q.run_count
FROM
(
SELECT
r.processid,
avg(h.some_column) AS column_1,
-- many more aggregate function on many more columns
count(1) AS run_count
FROM
foo_bar_table r,
foo_bar_table h
WHERE (h.processid IN (SELECT p.processid
FROM process p
LEFT JOIN bar i ON p.barid = i.id
LEFT JOIN foo ii ON i.fooid = ii.fooid
JOIN foofoobar pt ON p.typeid = pt.typeid AND pt.displayname ~~
((SELECT ('%'::text || property.value) || '%'::text
FROM property
WHERE property.name = 'something'::text))
WHERE p.processid < r.processid
AND (ii.name = r.foo_name OR ii.name IS NULL AND r.foo_name IS NULL)
ORDER BY p.processid DESC
LIMIT 5))
GROUP BY r.processid
) q;
I would just like to understand, does this mean that most of the time is spent performing the GROUP BY processid?
If not, what is causing the issue? I can't think of a reason why is this query so slow.
The aggregate functions used are avg, min, max, stddev.
A total of 52 of them were used, 4 on each of the 13 columns.
Update: Expanding on the child node of SubPlan 2. We can see that the Bitmap Index Scan on process_pkey part is the bottleneck.
-> Bitmap Heap Scan on process p_30 (cost=1825.89..3786.00 rows=715 width=24) (actual time=8.642..8.833 rows=394 loops=620944)
Recheck Cond: ((typeid = pt_30.typeid) AND (processid < p.processid))
Heap Blocks: exact=185476288
-> BitmapAnd (cost=1825.89..1825.89 rows=715 width=0) (actual time=8.611..8.611 rows=0 loops=620944)
-> Bitmap Index Scan on ix_process_typeid (cost=0.00..40.50 rows=2144 width=0) (actual time=0.077..0.077 rows=788 loops=620944)
Index Cond: (typeid = pt_30.typeid)
-> Bitmap Index Scan on process_pkey (cost=0.00..1761.20 rows=95037 width=0) (actual time=8.481..8.481 rows=145093 loops=620944)
Index Cond: (processid < p.processid)
What I am unable to figure out is why is it using a Bitmap Index Scan and not Index Scan. From what it seems, there should only be 788 rows that needs to be compared? Wouldn't that be faster? If not how can I optimise this query?
processid is of bigint type and has an index
The complete execution plan is here.
You conveniently left out the names of the tables in the execution plan, but I assume that the nested loop join is between foo_bar_table r and foo_bar_table h, and the subplan is the IN condition.
The high execution time is caused by the subplan, which is executed for each potential join result, that is 788 * 788 = 620944 times. 620944 * 9.156 accounts for 5685363 milliseconds.
Create this index:
CREATE INDEX ON process (typeid, processid, installationid);
And run VACUUM:
VACUUM process;
That should give you a fast index-only scan.

PostgreSql doesn't use index on Join

Let's say we have the following 2 tables:
purchases
-> id
-> classic_id(indexed TEXT)
-> other columns
purchase_items_2(a temporary table)
-> id
-> order_id(indexed TEXT)
-> other columns
I want to do a SQL join between the 2 tables like so:
Select pi.id, pi.order_id, p.id
from purchase_items_2 pi
INNER JOIN purchases p ON pi.order_id = p.classic.id
This thing should use the indexes no? It is not.
Any clue why?
This is the explanation of the query
INNER JOIN purchases ON #{#table_name}.order_id = purchases.classic_id")
QUERY PLAN
---------------------------------------------------------------------------------
Hash Join (cost=433.80..744.69 rows=5848 width=216)
Hash Cond: ((purchase_items_2.order_id)::text = (purchases.classic_id)::text)
-> Seq Scan on purchase_items_2 (cost=0.00..230.48 rows=5848 width=208)
-> Hash (cost=282.80..282.80 rows=12080 width=16)
-> Seq Scan on purchases (cost=0.00..282.80 rows=12080 width=16)
(5 rows)
When I do a where query
Select pi.id
from purchase_items_2 pi
where pi.order_id = 'gigel'
It uses the index
QUERY PLAN
--------------------------------------------------------------------------------------------------
Bitmap Heap Scan on purchase_items_2 (cost=4.51..80.78 rows=29 width=208)
Recheck Cond: ((order_id)::text = 'gigel'::text)
-> Bitmap Index Scan on index_purchase_items_2_on_order_id (cost=0.00..4.50 rows=29 width=0)
Index Cond: ((order_id)::text = 'gigel'::text)\n(4 rows)
Since you have no WHERE condition, the query has to read all rows of both tables anyway. And since the hash table built by the hash join fits in work_mem, a hash join (that has to perform a sequential scan on both tables) is the most efficient join strategy.
PostgreSQL doesn't use the indexes because it is faster without them in this specific query.

Why isn't PostgreSQL using an index in a Merge Join scenario?

explain select count(1) from tab1_201502 t1, tab2_201502 t2
where t1.serv_no=t2.serv_no
and t1.PC_LOGIN_COUNT1 >5
and t1.FET_WZ_FEE < 80
and t2.ALL_FLOW_2G<50;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=4358706.25..4358706.26 rows=1 width=0)
-> Merge Join (cost=4339930.99..4358703.30 rows=1179 width=0)
Merge Cond: ((t1.serv_no)::text = (t2.serv_no)::text)
-> Index Scan using tab1_201502_serv_no_idx on tab1_201502 t1
(cost=0.56..6239071.57 rows=263219 width=12)
Filter: ((pc_login_count1 > 5::numeric)
AND (fet_wz_fee < 80::numeric))
-> Sort (cost=4339914.76..4340306.63 rows=156747 width=12)
Sort Key: t2.serv_no
-> Seq Scan on tab2_201502 t2
(cost=0.00..4326389.00 rows=156747 width=12)
Filter: (all_flow_2g < 50::numeric)
All tables are indexed on serv_no.
Why is PostgreSQL ignoring the tab2_201502 index for scan?
This is your query:
select count(1)
from tab1_201502 t1 join
tab2_201502 t2
on t1.serv_no = t2.serv_no
where t1.PC_LOGIN_COUNT1 > 5 and t1.FET_WZ_FEE < 80 and t2.ALL_FLOW_2G < 50;
Postgres is deciding that filtering by the where clause is more important than performing the join.
I would recommend trying two sets of indexes for this query. They are: tab2_201502(ALL_FLOW_2G, serv_no) and tab1_201502(serv_no, PC_LOGIN_COUNT1, FET_WZ_FEE).
The second pair is: tab1_201502(PC_LOGIN_COUNT1, FET_WZ_FEE, serv_no) and tab2_201502(serv_no, ALL_FLOW_2G).
Which works better depends on which table is the driving table for the join.

Will Postgres push down a WHERE clause into a VIEW with a Window Function (Aggregate)?

The docs for Pg's Window function say:
The rows considered by a window function are those of the "virtual table" produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways by means of different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
However, I'm not seeing this. It seems to me like the Select Filter is very near to the left margin and the top (last thing done).
=# EXPLAIN SELECT * FROM chrome_nvd.view_options where fkey_style = 303451;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Subquery Scan view_options (cost=2098450.26..2142926.28 rows=14825 width=180)
Filter: (view_options.fkey_style = 303451)
-> Sort (cost=2098450.26..2105862.93 rows=2965068 width=189)
Sort Key: o.sequence
-> WindowAgg (cost=1446776.02..1506077.38 rows=2965068 width=189)
-> Sort (cost=1446776.02..1454188.69 rows=2965068 width=189)
Sort Key: h.name, k.name
-> WindowAgg (cost=802514.45..854403.14 rows=2965068 width=189)
-> Sort (cost=802514.45..809927.12 rows=2965068 width=189)
Sort Key: h.name
-> Hash Join (cost=18.52..210141.57 rows=2965068 width=189)
Hash Cond: (o.fkey_opt_header = h.id)
-> Hash Join (cost=3.72..169357.09 rows=2965068 width=166)
Hash Cond: (o.fkey_opt_kind = k.id)
-> Seq Scan on options o (cost=0.00..128583.68 rows=2965068 width=156)
-> Hash (cost=2.21..2.21 rows=121 width=18)
-> Seq Scan on opt_kind k (cost=0.00..2.21 rows=121 width=18)
-> Hash (cost=8.80..8.80 rows=480 width=31)
-> Seq Scan on opt_header h (cost=0.00..8.80 rows=480 width=31)
(19 rows)
These two WindowAgg's essentially change the plan to something that seems to never finish from the much faster
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan view_options (cost=329.47..330.42 rows=76 width=164) (actual time=20.263..20.403 rows=42 loops=1)
-> Sort (cost=329.47..329.66 rows=76 width=189) (actual time=20.258..20.300 rows=42 loops=1)
Sort Key: o.sequence
Sort Method: quicksort Memory: 35kB
-> Hash Join (cost=18.52..327.10 rows=76 width=189) (actual time=19.427..19.961 rows=42 loops=1)
Hash Cond: (o.fkey_opt_header = h.id)
-> Hash Join (cost=3.72..311.25 rows=76 width=166) (actual time=17.679..18.085 rows=42 loops=1)
Hash Cond: (o.fkey_opt_kind = k.id)
-> Index Scan using options_pkey on options o (cost=0.00..306.48 rows=76 width=156) (actual time=17.152..17.410 rows=42 loops=1)
Index Cond: (fkey_style = 303451)
-> Hash (cost=2.21..2.21 rows=121 width=18) (actual time=0.432..0.432 rows=121 loops=1)
-> Seq Scan on opt_kind k (cost=0.00..2.21 rows=121 width=18) (actual time=0.042..0.196 rows=121 loops=1)
-> Hash (cost=8.80..8.80 rows=480 width=31) (actual time=1.687..1.687 rows=480 loops=1)
-> Seq Scan on opt_header h (cost=0.00..8.80 rows=480 width=31) (actual time=0.030..0.748 rows=480 loops=1)
Total runtime: 20.893 ms
(15 rows)
What is going on, and how do I fix it? I'm using Postgresql 8.4.8. Here is what the actual view is doing:
SELECT o.fkey_style, h.name AS header, k.name AS kind
, o.code, o.name AS option_name, o.description
, count(*) OVER (PARTITION BY h.name) AS header_count
, count(*) OVER (PARTITION BY h.name, k.name) AS header_kind_count
FROM chrome_nvd.options o
JOIN chrome_nvd.opt_header h ON h.id = o.fkey_opt_header
JOIN chrome_nvd.opt_kind k ON k.id = o.fkey_opt_kind
ORDER BY o.sequence;
No, PostgreSQL will only push down a WHERE clause on a VIEW that does not have an Aggregate. (Window functions are consider Aggregates).
< x> I think that's just an implementation limitation
< EvanCarroll> x: I wonder what would have to be done to push the
WHERE clause down in this case.
< EvanCarroll> the planner would have to know that the WindowAgg doesn't itself add selectivity and therefore it is safe to push the WHERE down?
< x> EvanCarroll; a lot of very complicated work with the planner, I'd presume
And,
< a> EvanCarroll: nope. a filter condition on a view applies to the output of the view and only gets pushed down if the view does not involve aggregates