Will Postgres push down a WHERE clause into a VIEW with a Window Function (Aggregate)? - postgresql

The docs for Pg's Window function say:
The rows considered by a window function are those of the "virtual table" produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways by means of different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
However, I'm not seeing this. It seems to me like the Select Filter is very near to the left margin and the top (last thing done).
=# EXPLAIN SELECT * FROM chrome_nvd.view_options where fkey_style = 303451;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Subquery Scan view_options (cost=2098450.26..2142926.28 rows=14825 width=180)
Filter: (view_options.fkey_style = 303451)
-> Sort (cost=2098450.26..2105862.93 rows=2965068 width=189)
Sort Key: o.sequence
-> WindowAgg (cost=1446776.02..1506077.38 rows=2965068 width=189)
-> Sort (cost=1446776.02..1454188.69 rows=2965068 width=189)
Sort Key: h.name, k.name
-> WindowAgg (cost=802514.45..854403.14 rows=2965068 width=189)
-> Sort (cost=802514.45..809927.12 rows=2965068 width=189)
Sort Key: h.name
-> Hash Join (cost=18.52..210141.57 rows=2965068 width=189)
Hash Cond: (o.fkey_opt_header = h.id)
-> Hash Join (cost=3.72..169357.09 rows=2965068 width=166)
Hash Cond: (o.fkey_opt_kind = k.id)
-> Seq Scan on options o (cost=0.00..128583.68 rows=2965068 width=156)
-> Hash (cost=2.21..2.21 rows=121 width=18)
-> Seq Scan on opt_kind k (cost=0.00..2.21 rows=121 width=18)
-> Hash (cost=8.80..8.80 rows=480 width=31)
-> Seq Scan on opt_header h (cost=0.00..8.80 rows=480 width=31)
(19 rows)
These two WindowAgg's essentially change the plan to something that seems to never finish from the much faster
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan view_options (cost=329.47..330.42 rows=76 width=164) (actual time=20.263..20.403 rows=42 loops=1)
-> Sort (cost=329.47..329.66 rows=76 width=189) (actual time=20.258..20.300 rows=42 loops=1)
Sort Key: o.sequence
Sort Method: quicksort Memory: 35kB
-> Hash Join (cost=18.52..327.10 rows=76 width=189) (actual time=19.427..19.961 rows=42 loops=1)
Hash Cond: (o.fkey_opt_header = h.id)
-> Hash Join (cost=3.72..311.25 rows=76 width=166) (actual time=17.679..18.085 rows=42 loops=1)
Hash Cond: (o.fkey_opt_kind = k.id)
-> Index Scan using options_pkey on options o (cost=0.00..306.48 rows=76 width=156) (actual time=17.152..17.410 rows=42 loops=1)
Index Cond: (fkey_style = 303451)
-> Hash (cost=2.21..2.21 rows=121 width=18) (actual time=0.432..0.432 rows=121 loops=1)
-> Seq Scan on opt_kind k (cost=0.00..2.21 rows=121 width=18) (actual time=0.042..0.196 rows=121 loops=1)
-> Hash (cost=8.80..8.80 rows=480 width=31) (actual time=1.687..1.687 rows=480 loops=1)
-> Seq Scan on opt_header h (cost=0.00..8.80 rows=480 width=31) (actual time=0.030..0.748 rows=480 loops=1)
Total runtime: 20.893 ms
(15 rows)
What is going on, and how do I fix it? I'm using Postgresql 8.4.8. Here is what the actual view is doing:
SELECT o.fkey_style, h.name AS header, k.name AS kind
, o.code, o.name AS option_name, o.description
, count(*) OVER (PARTITION BY h.name) AS header_count
, count(*) OVER (PARTITION BY h.name, k.name) AS header_kind_count
FROM chrome_nvd.options o
JOIN chrome_nvd.opt_header h ON h.id = o.fkey_opt_header
JOIN chrome_nvd.opt_kind k ON k.id = o.fkey_opt_kind
ORDER BY o.sequence;

No, PostgreSQL will only push down a WHERE clause on a VIEW that does not have an Aggregate. (Window functions are consider Aggregates).
< x> I think that's just an implementation limitation
< EvanCarroll> x: I wonder what would have to be done to push the
WHERE clause down in this case.
< EvanCarroll> the planner would have to know that the WindowAgg doesn't itself add selectivity and therefore it is safe to push the WHERE down?
< x> EvanCarroll; a lot of very complicated work with the planner, I'd presume
And,
< a> EvanCarroll: nope. a filter condition on a view applies to the output of the view and only gets pushed down if the view does not involve aggregates

Related

How to interpret PostgreSQL EXPLAIN results when query hangs

I have no idea how to simplify this problem, so this is going to be a long question.
For openers, for reasons I won't get into, I normalized out long paragraphs to a table named shared.notes.
Next I have a complicated view with a number of paragraph lookups. Each note_id field is (a) indexed and (b) has a foreign key constraint to the notes table. Pseudo code below:
CREATE VIEW shared.vw_get_the_whole_kit_and_kaboodle AS
SELECT
yada yada
, mi.electrical_note_id
, electrical_notes.note AS electrical_notes
, mi.hvac_note_id
, hvac_notes.note AS hvac_notes
, mi.network_note_id
, network_notes.note AS network_notes
, mi.plumbing_note_id
, plumbing_notes.note AS plumbing_notes
, mi.specification_note_id
, specification_notes.note AS specification_notes
, mi.structural_note_id
, structural_notes.note AS structural_notes
FROM shared.a_table AS mi
JOIN shared.generic_items AS gi
ON mi.generic_item_id = gi.generic_item_id
JOIN shared.manufacturers AS mft
ON mi.manufacturer_id = mft.manufacturer_id
JOIN shared.notes AS electrical_notes
ON mi.electrical_note_id = electrical_notes.note_id
JOIN shared.notes AS hvac_notes
ON mi.hvac_note_id = hvac_notes.note_id
JOIN shared.notes AS plumbing_notes
ON mi.plumbing_note_id = plumbing_notes.note_id
JOIN shared.notes AS specification_notes
ON mi.specification_note_id = specification_notes.note_id
JOIN shared.notes AS structural_notes
ON mi.structural_note_id = structural_notes.note_id
JOIN shared.notes AS network_notes
ON mi.network_note_id = network_notes.note_id
JOIN shared.connectivity AS nc
ON mi.connectivity_id = nc.connectivity_id
WHERE
mi.deletion_date IS NULL;
Then I select against this view:
SELECT
lots of columns...
FROM shared.vw_get_the_whole_kit_and_kaboodle
WHERE
is_active = TRUE
AND is_inventory = FALSE;
Strangely, in the cloud GCP databases, I've not run into problems yet, and there are thousands of rows involved in a number of these tables.
Meanwhile back at the ranch, on my local PC, I've got a test version of the database. SAME EXACT SQL, down to the last letter. Trust me on that. For table definitions, view definitions, indexes... everything.
The cloud will return queries nearly instantaneously.
The local PC will hang--this despite the fact that the PC database has a mere handful of rows each in the various tables. So if one should hang, it ought to be the cloud databases. But it's the other way around; the tiny-dataset database is the one that fails.
Add this plot twist in: if I remove the filter for is_inventory, the query on the PC returns instantaneously. Also, if I just remove, one by one, the joins to the notes table, after about half of them are gone, the PC starts to finish instantaneously. It's almost like it's upset to be hitting the same table so many times with one query.
If I run EXPLAIN (without the ANALYZE option), here's the NO-hang version:
Hash Left Join (cost=31.55..40.09 rows=43 width=751)
Hash Cond: (mi.mounting_location_id = ml.mounting_location_id)
-> Hash Left Join (cost=30.34..38.76 rows=43 width=719)
Hash Cond: (mi.price_type_id = pt.price_type_id)
-> Hash Join (cost=29.25..37.53 rows=43 width=687)
Hash Cond: (mi.connectivity_id = nc.connectivity_id)
-> Nested Loop (cost=28.16..36.21 rows=43 width=655)
Join Filter: (mi.network_note_id = network_notes.note_id)
-> Seq Scan on notes network_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=28.16..34.66 rows=43 width=623)
Join Filter: (mi.plumbing_note_id = plumbing_notes.note_id)
-> Seq Scan on notes plumbing_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash Join (cost=28.16..33.11 rows=43 width=591)
Hash Cond: (mi.generic_item_id = gi.generic_item_id)
-> Hash Join (cost=5.11..9.95 rows=43 width=559)
Hash Cond: (mi.structural_note_id = structural_notes.note_id)
-> Hash Join (cost=4.09..8.57 rows=43 width=527)
Hash Cond: (mi.specification_note_id = specification_notes.note_id)
-> Hash Join (cost=3.07..7.37 rows=43 width=495)
Hash Cond: (mi.hvac_note_id = hvac_notes.note_id)
-> Hash Join (cost=2.04..5.99 rows=43 width=463)
Hash Cond: (mi.electrical_note_id = electrical_notes.note_id)
-> Hash Join (cost=1.02..4.70 rows=43 width=431)
Hash Cond: (mi.manufacturer_id = mft.manufacturer_id)
-> Seq Scan on mft_items mi (cost=0.00..3.44 rows=43 width=399)
Filter: ((deletion_date IS NULL) AND is_active)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on manufacturers mft (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes electrical_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes hvac_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes specification_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=1.01..1.01 rows=1 width=48)
-> Seq Scan on notes structural_notes (cost=0.00..1.01 rows=1 width=48)
-> Hash (cost=15.80..15.80 rows=580 width=48)
-> Seq Scan on generic_items gi (cost=0.00..15.80 rows=580 width=48)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on connectivity nc (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on price_types pt (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.09..1.09 rows=9 width=48)
-> Seq Scan on mounting_locations ml (cost=0.00..1.09 rows=9 width=48)
And this is the hang version:
Hash Left Join (cost=26.43..38.57 rows=16 width=751)
Hash Cond: (mi.mounting_location_id = ml.mounting_location_id)
-> Hash Left Join (cost=25.23..37.32 rows=16 width=719)
Hash Cond: (mi.price_type_id = pt.price_type_id)
-> Hash Join (cost=24.14..36.18 rows=16 width=687)
Hash Cond: (mi.connectivity_id = nc.connectivity_id)
-> Nested Loop (cost=23.05..35.00 rows=16 width=655)
Join Filter: (mi.network_note_id = network_notes.note_id)
-> Seq Scan on notes network_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..33.79 rows=16 width=623)
Join Filter: (mi.structural_note_id = structural_notes.note_id)
-> Seq Scan on notes structural_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..32.58 rows=16 width=591)
Join Filter: (mi.electrical_note_id = electrical_notes.note_id)
-> Seq Scan on notes electrical_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..31.37 rows=16 width=559)
Join Filter: (mi.specification_note_id = specification_notes.note_id)
-> Seq Scan on notes specification_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..30.16 rows=16 width=527)
Join Filter: (mi.plumbing_note_id = plumbing_notes.note_id)
-> Seq Scan on notes plumbing_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..28.95 rows=16 width=495)
Join Filter: (mi.hvac_note_id = hvac_notes.note_id)
-> Seq Scan on notes hvac_notes (cost=0.00..1.01 rows=1 width=48)
-> Nested Loop (cost=23.05..27.74 rows=16 width=463)
Join Filter: (mi.manufacturer_id = mft.manufacturer_id)
-> Seq Scan on manufacturers mft (cost=0.00..1.01 rows=1 width=48)
-> Hash Join (cost=23.05..26.53 rows=16 width=431)
Hash Cond: (mi.generic_item_id = gi.generic_item_id)
-> Seq Scan on mft_items mi (cost=0.00..3.44 rows=16 width=399)
Filter: ((deletion_date IS NULL) AND is_active AND (NOT is_inventory))
-> Hash (cost=15.80..15.80 rows=580 width=48)
-> Seq Scan on generic_items gi (cost=0.00..15.80 rows=580 width=48)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on connectivity nc (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.04..1.04 rows=4 width=36)
-> Seq Scan on price_types pt (cost=0.00..1.04 rows=4 width=36)
-> Hash (cost=1.09..1.09 rows=9 width=48)
-> Seq Scan on mounting_locations ml (cost=0.00..1.09 rows=9 width=48)
I'd like to understand what I should be doing differently to escape this hang condition. Unfortunately, I'm not clear on what I'm doing wrong.

Postgres query optimizer generates bad plan after adding another order by criterion

I'm using django orm with select related and it generates the query of the form:
SELECT *
FROM "coupons_coupon"
LEFT OUTER JOIN "coupons_merchant"
ON ("coupons_coupon"."merchant_id" = "coupons_merchant"."slug")
WHERE ("coupons_coupon"."end_date" > '2020-07-10T09:10:28.101980+00:00'::timestamptz AND "coupons_coupon"."published" = true)
ORDER BY "coupons_coupon"."end_date" ASC, "coupons_coupon"."id"
LIMIT 5;
Which is then executed using the following plan:
Limit (cost=4363.28..4363.30 rows=5 width=604) (actual time=21.864..21.865 rows=5 loops=1)
-> Sort (cost=4363.28..4373.34 rows=4022 width=604) (actual time=21.863..21.863 rows=5 loops=1)
Sort Key: coupons_coupon.end_date, coupons_coupon.id"
Sort Method: top-N heapsort Memory: 32kB
-> Hash Left Join (cost=2613.51..4296.48 rows=4022 width=604) (actual time=13.918..20.209 rows=4022 loops=1)
Hash Cond: ((coupons_coupon.merchant_id)::text = (coupons_merchant.slug)::text)
-> Seq Scan on coupons_coupon (cost=0.00..291.41 rows=4022 width=261) (actual time=0.007..1.110 rows=4022 loops=1)
Filter: (published AND (end_date > '2020-07-10 09:10:28.10198+00'::timestamp with time zone))
Rows Removed by Filter: 1691
-> Hash (cost=1204.56..1204.56 rows=24956 width=331) (actual time=13.894..13.894 rows=23911 loops=1)
Buckets: 16384 Batches: 4 Memory Usage: 1948kB
-> Seq Scan on coupons_merchant (cost=0.00..1204.56 rows=24956 width=331) (actual time=0.003..4.681 rows=23911 loops=1)
Which is a bad execution plan as join can be done after the left table has been filtered, ordered and limited. When I remove the id from order by it generates an efficient plan, which basically could have been used in the previous query as well.
Limit (cost=0.57..8.84 rows=5 width=600) (actual time=0.013..0.029 rows=5 loops=1)
-> Nested Loop Left Join (cost=0.57..6650.48 rows=4022 width=600) (actual time=0.012..0.028 rows=5 loops=1)
-> Index Scan using coupons_cou_end_dat_a8d5b7_btree on coupons_coupon (cost=0.28..1015.77 rows=4022 width=261) (actual time=0.007..0.010 rows=5 loops=1)
Index Cond: (end_date > '2020-07-10 09:10:28.10198+00'::timestamp with time zone)
Filter: published
-> Index Scan using coupons_merchant_pkey on coupons_merchant (cost=0.29..1.40 rows=1 width=331) (actual time=0.003..0.003 rows=1 loops=5)
Index Cond: ((slug)::text = (coupons_coupon.merchant_id)::text)
Why is this happening? Can the optimizer be nudged to use similar plan for the former query?
I'm using postgres 12.
v13 of PostgreSQL, which should be released in the next few months, implements incremental sorting, in which it can read rows in an pre-sorted order based on prefix columns, then sorts just the ties on those prefix column(s) by the remaining column(s) in order to get a complete sort based on more columns than an index provides. I think that will do more or less what you want.
Limit (cost=2.46..2.99 rows=5 width=21)
-> Incremental Sort (cost=2.46..405.58 rows=3850 width=21)
Sort Key: coupons_coupon.end_date, coupons_coupon.id
Presorted Key: coupons_coupon.end_date
-> Nested Loop Left Join (cost=0.31..253.48 rows=3850 width=21)
-> Index Scan using coupons_coupon_end_date_idx on coupons_coupon (cost=0.15..54.71 rows=302 width=17)
Index Cond: (end_date > '2020-07-10 05:10:28.10198-04'::timestamp with time zone)
Filter: published
-> Index Only Scan using coupons_merchant_slug_idx on coupons_merchant (cost=0.15..0.53 rows=13 width=4)
Index Cond: (slug = coupons_coupon.merchant_id)
Of course just adding "id" into the current index will work under currently released versions, and even under version 13 is should be more efficient to have the index fully order the rows in the way you need them.

Can PostgreSQL 12 do partition pruning at execution time with subquery returning a list?

I'm trying to take advantages of partitioning in one case:
I have table "events" which partitioned by list by field "dt_pk" which is foreign key to table "dates".
-- Schema
drop schema if exists test cascade;
create schema test;
-- Tables
create table if not exists test.dates (
id bigint primary key,
dt date not null
);
create sequence test.seq_events_id;
create table if not exists test.events
(
id bigint not null,
dt_pk bigint not null,
content_int bigint,
foreign key (dt_pk) references test.dates(id) on delete cascade,
primary key (dt_pk, id)
)
partition by list (dt_pk);
-- Partitions
create table test.events_1 partition of test.events for values in (1);
create table test.events_2 partition of test.events for values in (2);
create table test.events_3 partition of test.events for values in (3);
-- Fill tables
insert into test.dates (id, dt)
select id, dt
from (
select 1 id, '2020-01-01'::date as dt
union all
select 2 id, '2020-01-02'::date as dt
union all
select 3 id, '2020-01-03'::date as dt
) t;
do $$
declare
dts record;
begin
for dts in (
select id
from test.dates
) loop
for k in 1..10000 loop
insert into test.events (id, dt_pk, content_int)
values (nextval('test.seq_events_id'), dts.id, random_between(1, 1000000));
end loop;
commit;
end loop;
end;
$$;
vacuum analyze test.dates, test.events;
I want to run select like this:
select *
from test.events e
join test.dates d on e.dt_pk = d.id
where d.dt between '2020-01-02'::date and '2020-01-03'::date;
But in this case partition pruning doesn't work. It's clear, I don't have constant for partition key. But from documentation I know that there is partition pruning at execution time, which works with value obtained from a subquery:
Partition pruning can be performed not only during the planning of a
given query, but also during its execution. This is useful as it can
allow more partitions to be pruned when clauses contain expressions
whose values are not known at query planning time, for example,
parameters defined in a PREPARE statement, using a value obtained from
a subquery, or using a parameterized value on the inner side of a
nested loop join.
So I rewrite my query like this and I expected partitionin pruning:
select *
from test.events e
where e.dt_pk in (
select d.id
from test.dates d
where d.dt between '2020-01-02'::date and '2020-01-03'::date
);
But explain for this select says:
Hash Join (cost=1.07..833.07 rows=20000 width=24) (actual time=3.581..15.989 rows=20000 loops=1)
Hash Cond: (e.dt_pk = d.id)
-> Append (cost=0.00..642.00 rows=30000 width=24) (actual time=0.005..6.361 rows=30000 loops=1)
-> Seq Scan on events_1 e (cost=0.00..164.00 rows=10000 width=24) (actual time=0.005..1.104 rows=10000 loops=1)
-> Seq Scan on events_2 e_1 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.005..1.127 rows=10000 loops=1)
-> Seq Scan on events_3 e_2 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.097 rows=10000 loops=1)
-> Hash (cost=1.04..1.04 rows=2 width=8) (actual time=0.006..0.006 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on dates d (cost=0.00..1.04 rows=2 width=8) (actual time=0.004..0.004 rows=2 loops=1)
Filter: ((dt >= '2020-01-02'::date) AND (dt <= '2020-01-03'::date))
Rows Removed by Filter: 1
Planning Time: 0.206 ms
Execution Time: 17.237 ms
So, we read all partitions. I even tried to the planner to use nested loop join, because I read in documentation "parameterized value on the inner side of a nested loop join", but it didn't work:
set enable_hashjoin to off;
set enable_mergejoin to off;
And again:
Nested Loop (cost=0.00..1443.05 rows=20000 width=24) (actual time=9.160..25.252 rows=20000 loops=1)
Join Filter: (e.dt_pk = d.id)
Rows Removed by Join Filter: 30000
-> Append (cost=0.00..642.00 rows=30000 width=24) (actual time=0.008..6.280 rows=30000 loops=1)
-> Seq Scan on events_1 e (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.105 rows=10000 loops=1)
-> Seq Scan on events_2 e_1 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.047 rows=10000 loops=1)
-> Seq Scan on events_3 e_2 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.007..1.082 rows=10000 loops=1)
-> Materialize (cost=0.00..1.05 rows=2 width=8) (actual time=0.000..0.000 rows=2 loops=30000)
-> Seq Scan on dates d (cost=0.00..1.04 rows=2 width=8) (actual time=0.004..0.004 rows=2 loops=1)
Filter: ((dt >= '2020-01-02'::date) AND (dt <= '2020-01-03'::date))
Rows Removed by Filter: 1
Planning Time: 0.202 ms
Execution Time: 26.516 ms
Then I noticed that in every example of "partition pruning at execution time" I see only = condition, not in.
And it really works that way:
explain (analyze) select * from test.events e where e.dt_pk = (select id from test.dates where id = 2);
Append (cost=1.04..718.04 rows=30000 width=24) (actual time=0.014..3.018 rows=10000 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on dates (cost=0.00..1.04 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
Filter: (id = 2)
Rows Removed by Filter: 2
-> Seq Scan on events_1 e (cost=0.00..189.00 rows=10000 width=24) (never executed)
Filter: (dt_pk = $0)
-> Seq Scan on events_2 e_1 (cost=0.00..189.00 rows=10000 width=24) (actual time=0.004..2.009 rows=10000 loops=1)
Filter: (dt_pk = $0)
-> Seq Scan on events_3 e_2 (cost=0.00..189.00 rows=10000 width=24) (never executed)
Filter: (dt_pk = $0)
Planning Time: 0.135 ms
Execution Time: 3.639 ms
And here is my final question: does partition pruning at execution time work only with subquery returning one item, or there is a way to get advantages of partition pruning with subquery returning a list?
And why doesn't it work with nested loop join, did I understand something wrong in words:
This includes values from subqueries and values from execution-time
parameters such as those from parameterized nested loop joins.
Or "parameterized nested loop joins" is something different from regular nested loop joins?
There is no partition pruning in your nested loop join because the partitioned table is on the outer side, which is always scanned completely. The inner side is scanned with the join key from the outer side as parameter (hence parameterized scan), so if the partitioned table were on the inner side of the nested loop join, partition pruning could happen.
Partition pruning with IN lists can take place if the list vales are known at plan time:
EXPLAIN (COSTS OFF)
SELECT * FROM test.events WHERE dt_pk IN (1, 2);
QUERY PLAN
---------------------------------------------------
Append
-> Seq Scan on events_1
Filter: (dt_pk = ANY ('{1,2}'::bigint[]))
-> Seq Scan on events_2
Filter: (dt_pk = ANY ('{1,2}'::bigint[]))
(5 rows)
But no attempts are made to flatten a subquery, and PostgreSQL doesn't use partition pruning, even if you force the partitioned table to be on the inner side (enable_material = off, enable_hashjoin = off, enable_mergejoin = off):
EXPLAIN (ANALYZE)
SELECT * FROM test.events WHERE dt_pk IN (SELECT 1 UNION SELECT 2);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.06..2034.09 rows=20000 width=24) (actual time=0.057..15.523 rows=20000 loops=1)
Join Filter: (events_1.dt_pk = (1))
Rows Removed by Join Filter: 40000
-> Unique (cost=0.06..0.07 rows=2 width=4) (actual time=0.026..0.029 rows=2 loops=1)
-> Sort (cost=0.06..0.07 rows=2 width=4) (actual time=0.024..0.025 rows=2 loops=1)
Sort Key: (1)
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.00..0.05 rows=2 width=4) (actual time=0.006..0.009 rows=2 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
-> Append (cost=0.00..642.00 rows=30000 width=24) (actual time=0.012..4.334 rows=30000 loops=2)
-> Seq Scan on events_1 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.011..1.057 rows=10000 loops=2)
-> Seq Scan on events_2 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.004..0.641 rows=10000 loops=2)
-> Seq Scan on events_3 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.002..0.594 rows=10000 loops=2)
Planning Time: 0.531 ms
Execution Time: 16.567 ms
(16 rows)
I am not certain, but it may be because the tables are so small. You might want to try with bigger tables.
If you care more about get it working than the fine details, and you haven't tried this yet: you can rewrite the query to something like
explain analyze select *
from test.dates d
join test.events e on e.dt_pk = d.id
where
d.dt between '2020-01-02'::date and '2020-01-03'::date
and e.dt_pk in (extract(day from '2020-01-02'::date)::int,
extract(day from '2020-01-03'::date)::int);
which will give the expected pruning.

PostgreSQL multiple joins and subqueries

When I join many tables to aggregate results I encounter the following problems:
Duplicate records, of which the intensity depends on the diversity within joined tables
Slow performance, of which sorts are the most memory consuming tasks revealed by explain analyze
My question is very simple: I want to tell PostgreSQL to stop the query from being dynamic at some point by staging the select command in steps, making certain selection criteria static - rather than dynamic.
What has to be accomplished can best be described as follows:
make a new table where I define the the most basic first level joins
save these "static" results
join the next level joins, and forget about any selection criteria that have happened in the step before that are no longer relevant for the next join
iterate through these steps until all joins are applied
However, I don't want to make separate tables. I want to achieve this goal in just one query.
Is that really too much to ask for? I want to tell PostgreSQL, for example with sub-queries, that the possibilities are limited and that it shouldn't keep worrying about sorts and what not when the previous subquery is correct.
This is an example of the query:
select a.report_id, a.object_id, a.statement, b.datapoint_id, dp.aspect_id, dp.entity_identifier_id, dp.period_id, dp.aspect_value_selection_id, dp.effective_value, c.label, r.start_id_pos, r.start_id_neg
from "..statements" a
left join table_data_points b on a.object_id = b.object_id
left join data_point dp on b.datapoint_id = dp.datapoint_id
left join "...labels" c on dp.aspect_id = c.aspect_id and a.statement = c.statement
left join ".relationships" r on a.relationship_set_id = r.relationship_set_id and dp.aspect_id = r.from_id
where a.report_id=1 and c.label is not null
Explain analyze:
Merge Join (cost=1151055.97..1198086.26 rows=3334642 width=200) (actual time=7295.864..7527.759 rows=178402 loops=1)
Merge Cond: ((c.aspect_id = dp.aspect_id) AND ((c.statement)::text = (a.statement)::text))
-> Sort (cost=94244.49..96166.99 rows=768997 width=34) (actual time=3772.495..3857.589 rows=381191 loops=1)
Sort Key: c.aspect_id, c.statement
Sort Method: quicksort Memory: 91433kB
-> Seq Scan on ...labels c (cost=0.00..19064.97 rows=768997 width=34) (actual time=0.126..1851.052 rows=768511 loops=1)
Filter: (label IS NOT NULL)
-> Sort (cost=1056811.07..1057409.25 rows=239275 width=179) (actual time=3523.293..3540.153 rows=178163 loops=1)
Sort Key: dp.aspect_id, a.statement
Sort Method: quicksort Memory: 74kB
-> Merge Left Join (cost=1028295.21..1035433.87 rows=239275 width=179) (actual time=3383.303..3522.908 rows=301 loops=1)
Merge Cond: ((dp.aspect_id = r.from_id) AND (a.relationship_set_id = r.relationship_set_id))
-> Sort (cost=896234.26..896832.45 rows=239275 width=75) (actual time=134.169..134.253 rows=289 loops=1)
Sort Key: dp.aspect_id, a.relationship_set_id
Sort Method: quicksort Memory: 65kB
-> Gather (cost=1001.14..874857.06 rows=239275 width=75) (actual time=15.204..133.907 rows=289 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (cost=1.14..849929.56 rows=140750 width=75) (actual time=0.056..34.625 rows=145 loops=2)
-> Nested Loop (cost=0.57..25680.40 rows=140750 width=35) (actual time=0.025..33.949 rows=145 loops=2)
-> Parallel Seq Scan on ..statements a (cost=0.00..4150.51 rows=4 width=27) (actual time=0.012..33.840 rows=2 loops=2)
Filter: (report_id = 1)
Rows Removed by Filter: 116790
-> Index Scan using table_data_points_index04 on table_data_points b (cost=0.57..5337.80 rows=4467 width=16) (actual time=0.010..0.038 rows=72 loops=4)
Index Cond: (object_id = a.object_id)
-> Index Scan using data_point_pkey on data_point dp (cost=0.57..5.85 rows=1 width=48) (actual time=0.004..0.004 rows=1 loops=289)
Index Cond: (datapoint_id = b.datapoint_id)
-> Sort (cost=132060.85..133822.38 rows=704613 width=128) (actual time=3249.054..3354.045 rows=264922 loops=1)
Sort Key: r.from_id, r.relationship_set_id
Sort Method: external sort Disk: 82536kB
-> Seq Scan on .relationships r (cost=0.00..63620.13 rows=704613 width=128) (actual time=0.059..1005.916 rows=704415 loops=1)
Planning time: 47.995 ms
Execution time: 7565.196 ms

adding more indexes to postgres causes "out of shared memory" error

i have a rather complex query that i'm trying to optimize in postgres 9.2 - the explain analyze gives this plan (explain.depesz.com):
Merge Right Join (cost=194965639.35..211592151.26 rows=420423258 width=616) (actual time=15898.283..15920.603 rows=17 loops=1)
Merge Cond: ((((p.context -> 'device'::text)) = ((s.context -> 'device'::text))) AND (((p.context -> 'physical_port'::text)) = ((s.context -> 'physical_port'::text))))
-> Sort (cost=68925.49..69073.41 rows=59168 width=393) (actual time=872.289..877.818 rows=39898 loops=1)
Sort Key: ((p.context -> 'device'::text)), ((p.context -> 'physical_port'::text))
Sort Method: quicksort Memory: 27372kB
-> Seq Scan on ports__status p (cost=0.00..64235.68 rows=59168 width=393) (actual time=0.018..60.931 rows=41395 loops=1)
-> Materialize (cost=194896713.86..199620346.93 rows=284223403 width=299) (actual time=15023.710..15024.779 rows=17 loops=1)
-> Merge Left Join (cost=194896713.86..198909788.42 rows=284223403 width=299) (actual time=15023.705..15024.765 rows=17 loops=1)
Merge Cond: ((((s.context -> 'device'::text)) = ((l1.context -> 'device'::text))) AND (((s.context -> 'physical_port'::text)) = ((l1.context -> 'physical_port'::text))))
-> Sort (cost=194894861.42..195605419.92 rows=284223403 width=224) (actual time=14997.225..14997.230 rows=17 loops=1)
Sort Key: ((s.context -> 'device'::text)), ((s.context -> 'physical_port'::text))
Sort Method: quicksort Memory: 33kB
-> GroupAggregate (cost=100001395.98..122028709.71 rows=284223403 width=389) (actual time=14997.120..14997.186 rows=17 loops=1)
-> Sort (cost=100001395.98..100711954.49 rows=284223403 width=389) (actual time=14997.080..14997.080 rows=17 loops=1)
Sort Key: ((d.context -> 'hostname'::text)), ((a.context -> 'ip_address'::text)), ((a.context -> 'mac_address'::text)), ((s.context -> 'device'::text)), ((s.context -> 'physical_port'::text)), s.created_at, s.updated_at, d.created_at, d.updated_at
Sort Method: quicksort Memory: 33kB
-> Merge Join (cost=339026.99..9576678.30 rows=284223403 width=389) (actual time=14996.710..14996.749 rows=17 loops=1)
Merge Cond: (((a.context -> 'mac_address'::text)) = ((s.context -> 'mac_address'::text)))
-> Sort (cost=15038.32..15136.00 rows=39072 width=255) (actual time=23.556..23.557 rows=1 loops=1)
Sort Key: ((a.context -> 'mac_address'::text))
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=471.88..12058.33 rows=39072 width=255) (actual time=13.482..23.548 rows=1 loops=1)
Hash Cond: ((a.context -> 'ip_address'::text) = (d.context -> 'ip_address'::text))
-> Seq Scan on arps__arps a (cost=0.00..8132.39 rows=46239 width=157) (actual time=0.007..11.191 rows=46259 loops=1)
-> Hash (cost=469.77..469.77 rows=169 width=98) (actual time=0.035..0.035 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on ipam__dns d (cost=9.57..469.77 rows=169 width=98) (actual time=0.023..0.023 rows=1 loops=1)
Recheck Cond: ((context -> 'hostname'::text) = 'zglast-oracle03.slac.stanford.edu'::text)
-> Bitmap Index Scan on ipam__dns_hostname_index (cost=0.00..9.53 rows=169 width=0) (actual time=0.017..0.017 rows=1 loops=1)
Index Cond: ((context -> 'hostname'::text) = 'blah'::text)
-> Sort (cost=323988.67..327625.84 rows=1454870 width=134) (actual time=14973.118..14973.120 rows=18 loops=1)
Sort Key: ((s.context -> 'mac_address'::text))
Sort Method: external sort Disk: 214176kB
-> Result (cost=0.00..175064.84 rows=1454870 width=134) (actual time=0.016..1107.604 rows=1265154 loops=1)
-> Append (cost=0.00..175064.84 rows=1454870 width=134) (actual time=0.013..796.578 rows=1265154 loops=1)
-> Seq Scan on spanning_tree__neighbour s (cost=0.00..0.00 rows=1 width=98) (actual time=0.000..0.000 rows=0 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
-> Seq Scan on spanning_tree__neighbour__vlan38 s (cost=0.00..469.32 rows=1220 width=129) (actual time=0.011..1.019 rows=823 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 403
-> Seq Scan on spanning_tree__neighbour__vlan3 s (cost=0.00..270.20 rows=1926 width=139) (actual time=0.017..0.971 rows=1882 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 54
-> Seq Scan on spanning_tree__neighbour__vlan466 s (cost=0.00..131.85 rows=306 width=141) (actual time=0.032..0.340 rows=276 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 32
-> Seq Scan on spanning_tree__neighbour__vlan465 s (cost=0.00..208.57 rows=842 width=142) (actual time=0.005..0.622 rows=768 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 78
-> Seq Scan on spanning_tree__neighbour__vlan499 s (cost=0.00..245.04 rows=481 width=142) (actual time=0.017..0.445 rows=483 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
-> Seq Scan on spanning_tree__neighbour__vlan176 s (cost=0.00..346.36 rows=2576 width=131) (actual time=0.008..1.443 rows=2051 loops=1)
Filter: ((context -> 'physical_port'::text) IS NOT NULL)
Rows Removed by Filter: 538
i'm a bit of a novice at reading the plan, but i think it's all down to the fact that i have the table spanning_tree__neighbour (which i've partitioned into numerous 'vlan' tables). as you can see it's performing a seq scan.
so i write a quick and dirty bash script to create indexes for the child tables:
create index spanning_tree__neighbour__vlan1_physical_port_index ON spanning_tree__neighbour__vlan1((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
create index spanning_tree__neighbour__vlan2_physical_port_index ON spanning_tree__neighbour__vlan2((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
create index spanning_tree__neighbour__vlan3_physical_port_index ON spanning_tree__neighbour__vlan3((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
...
but after i create a hundred or so of them, any query gives:
=> explain analyze select * from hosts where hostname='blah';
WARNING: out of shared memory
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Time: 34.757 ms
will setting max_locks_per_transaction actually help? what value should i use given that my partitioned table has upto 4096 child tables?
or have i read the plan wrong?
will setting max_locks_per_transaction actually help?
No, it won't.
Not before fixing your schema and your query first anyway.
A few things pop out… Some already mentioned in comments. In no particular order:
Stats are off. ANALYZE your tables and, if you determine that autovacuum doesn't have enough memory to do its job properly, increase maintenance_work_mem.
Steps like Sort Method: external sort Disk: 214176kB indicate that you're sorting rows on disk. Increase work_mem accordingly.
Steps like Seq Scan on spanning_tree__neighbour__vlan176 s (cost=0.00..346.36 rows=2576 width=131) (actual time=0.008..1.443 rows=2051 loops=1) followed by append are dubious at best.
Look… Partition tables when you want to turn something unmanageable or impractical into something more manageable, e.g. pushing a few billion rows of old data out of the way of more the couple of millions that are used on a daily basis. Not to turn a couple of million rows into 4,096 puny tables with a pathetically small 1k rows in them on average.
The next offender is things like Filter: ((context -> 'physical_port'::text) IS NOT NULL) —- ARGH.
Never, ever store things in hstore, JSON, XML or any other kind of EAV (entity-attribute-value store), if you care about the data that lands in it; in particular if it appears in a where, join or sort (!) clause. No ifs, no buts: just change your schema.
Plus, a bunch of the fields that appear in your query could be conveniently stored using Postgres' network types instead of dumb text. Odds are they should all be indexed, too. (They wouldn't appear in the plan if they shouldn't.)
You've a step that does a GroupAggregate beneath a left join. Typically, this indicates a query like: … left join (select agg_fn(…) … group by …) foo …. That's a big no no in my experience. Pull that out of your query if you can.
The plan is too long and unreadable to guess why it's doing that exactly, but if select * from hosts where hostname='blah'; is anything to go by, you seem to be selecting absolutely every possible thing you can access in one query.
It's a lot cheaper, and faster, to find the select few rows that you actually want, and then run a handful of other queries to select the related data. So do so.
If you still need to join with that aggregate subquery for some reason, be sure to look into window functions. More often than not, they'll spare you the need for gory joins, by allowing you to run the aggregate on the current set of rows directly.
Once you've done these steps, the default max_locks_per_transaction should be fine.