Simple equijoin between postgreSQL partitions is taking a long time (10 minutes) - postgresql

We are using postgreSQL database and I am hitting some limits. I have partitions on company_sale_account table
based on company name
We generate a report on accounts matched between the two. Below is the query:
SELECT cpsa1.*
FROM company_sale_account cpsa1
JOIN company_sale_account cpsa2 ON cpsa1.sale_account_id = cpsa2.sale_account_id
WHERE cpsa1.company_name = 'company_a'
AND cpsa2.company_name = 'company_b'
We have setup BTREE indexes on sale_account_id column on both the tables.
This worked fine till recently. Now, we have 10 million rows in
company_a partition and 7 million rows in company_b partition. This query is taking
more than 10 minutes.
Below is the explain plan output for it:
Buffers: shared hit=20125996 read=47811 dirtied=75, temp read=1333427 written=1333427 I/O Timings: read=19619.322
Sort (cost=167950986.43..168904299.23 rows=381325118 width=132) (actual time=517017.334..603691.048 rows=16854094 loops=1)
Sort Key: cpsa1.crm_account_id, ((cpsa1.account_name)::text),((cpsa1.account_owner)::text), ((cpsa1.account_type)::text), cpsa1.is_customer, ((date_part('epoch'::text,cpsa1.created_date))::integer),((hstore_to_json(cpsa1.custom_crm_fields))::tex (...)
Sort Method: external merge Disk: 2862656kB
Buffers: shared hit=20125996 read=47811 dirtied=75, temp read=1333427 written=1333427
I/O Timings: read=19619.322
- Nested Loop (cost=0.00..9331268.39 rows=381325118 width=132) (actual time=1.680..118698.570 rows=16854094 loops=1)
Buffers: shared hit=20125977 read=47811 dirtied=75
I/O Timings: read=19619.322
- Append (cost=0.00..100718.94 rows=2033676 width=33) (actual time=0.014..1783.243 rows=2033675 loops=1)
Buffers: shared hit=75298 dirtied=75
- Seq Scan on company_sale_account cpsa2 (cost=0.00..0.00 rows=1 width=516) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((company_name)::text = 'company_b'::text)
- Seq Scan on company_sale_account_concur cpsa2_1 (cost=0.00..100718.94 rows=2033675 width=33) (actual time=0.013..938.145 rows=2033675 loops=1)
Filter: ((company_name)::text = 'company_b'::text)
Buffers: shared hit=75298 dirtied=75
- Append (cost=0.00..1.97 rows=23 width=355) (actual time=0.034..0.047 rows=8 loops=2033675)
Buffers: shared hit=20050679 read=47811
I/O Timings: read=19619.322
- Seq Scan on company_sale_account cpsa1 (cost=0.00..0.00 rows=1 width=4525) (actual time=0.000..0.000 rows=0 loops=2033675)
Filter: (((company_name)::text = 'company_a'::text) AND ((cpsa2.sale_account_id)::text = (sale_account_id)::text))
- Index Scan using ix_csa_adp_sale_account on company_sale_account_adp cpsa1_1 (cost=0.56..1.97 rows=22 width=165) (actual time=0.033..0.042 rows=8 loops=2033675)
Index Cond: ((sale_account_id)::text = (cpsa2.sale_account_id)::text)
Filter: ((company_name)::text = 'company_a'::text)
Buffers: shared hit=20050679 read=47811
I/O Timings: read=19619.322
Planning time: 30.853 ms
Execution time: 618218.321ms
Do you have any suggestion on how to tune postgres.
Please share your thoughts. It would be a great help to me.

Related

Sudden drop in postgreSQL Memory

Version: 13.6
Hi ,
This is a 2 days old issue. Our Postgresql is hosted on AWS RDS Postgresql. Suddenly our free memory dropped from 1300 MB to 23 MB and swap memory increased from 150 mb to 12OO. MB. CPU was little high, read and write iops were high for 2-3 hours , post that low.
Average row size from tbl1 was 65 bytes and from tbl2 was 350 bytes.
In performance insight we had only one query which was having most wait on CPU. Below is explain plan of the query. I wanted to understand why this query lead to so much of free memory drop and swap memory increase?
Once we created a combined index on (hotel_id, checkin) include(booking_id), free memory became stable.
Query:
database1 => EXPLAIN(ANALYZE, VERBOSE, BUFFERS, COSTS, TIMING)
database1-> select l.*, o.* FROM tbl1 o inner join tbl2 l on o.booking_id = l.booking_id where o.checkin = '2022-09-09' and o.hotel_id = 95315 and l.status = 'YES_RM';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3014.65..3077.08 rows=1 width=270) (actual time=71.157..71.159 rows=0 loops=1)
Output: l.*,o.*
Inner Unique: true
Buffers: shared hit=3721 read=170 dirtied=4
I/O Timings: read=8.173
-> Bitmap Heap Scan on public.tbl1 o (cost=3014.08..3034.13 rows=5 width=74) (actual time=69.447..70.814 rows=8 loops=1)
Output: o.booking_id, o.checkin, o.created_at, o.hotel_id, o.hotel_time_zone, o.invoice, o.booking_created_at, o.updated_at, o.user_profile_id
Recheck Cond: ((o.hotel_id = 95315) AND (o.checkin = '2022-09-09'::date))
Rows Removed by Index Recheck: 508
Heap Blocks: exact=512
Buffers: shared hit=3683 read=163
I/O Timings: read=8.148
-> BitmapAnd (cost=3014.08..3014.08 rows=5 width=0) (actual time=69.182..69.183 rows=0 loops=1)
Buffers: shared hit=3312 read=19
I/O Timings: read=7.812
-> Bitmap Index Scan on idx33t1pn8h284cj2rg6pa33bxqb (cost=0.00..167.50 rows=6790 width=0) (actual time=8.550..8.550 rows=3909 loops=1)
Index Cond: (o.hotel_id = 95315)
Buffers: shared hit=5 read=19
I/O Timings: read=7.812
-> Bitmap Index Scan on idx6xuah53fdtx4p5afo5cbgul47 (cost=0.00..2846.33 rows=114368 width=0) (actual time=59.391..59.391 rows=297558 loops=1)
Index Cond: (o.checkin = '2022-09-09'::date)
Buffers: shared hit=3307
-> Index Scan using idxicjnq3084849d8vvmnhrornn7 on public. tbl2 l (cost=0.57..8.59 rows=1 width=204) (actual time=0.042..0.042 rows=0 loops=8)
Output: l.*
Index Cond: (l.booking_id = o.booking_id)
Filter: ((l.status)::text = 'YES_RM'::text)
Rows Removed by Filter: 1
Buffers: shared hit=38 read=7 dirtied=4
I/O Timings: read=0.024
Planning:
Buffers: shared hit=335 read=57
I/O Timings: read=0.180
Planning Time: 1.472 ms
Execution Time: 71.254 ms
(34 rows)
After Index creation:
Nested Loop (cost=1.14..65.02 rows=1 width=270) (actual time=6.640..6.640 rows=0 loops=1)
Output: l.*,o.*
Inner Unique: true
Buffers: shared hit=54 read=16 dirtied=4
I/O Timings: read=5.554
-> Index Scan using hotel_id_check_in_index on public.tbl1 (cost=0.57..22.07 rows=5 width=74) (actual time=2.105..5.361 rows=11 loops=1)
Output: o.*
Index Cond: ((o.hotel_id = 95315) AND (o.checkin = '2022-09-09'::date))
Buffers: shared hit=8 read=13 dirtied=4
I/O Timings: read=4.354
-> Index Scan using idxicjnq3084849d8vvmnhrornn7 on public.tbl2 l (cost=0.57..8.59 rows=1 width=204) (actual time=0.116..0.116 rows=0 loops=11)
Output: l.*
Index Cond: (l.booking_id = o.booking_id)
Filter: ((l.tatus)::text = 'YES_RM'::text)
Rows Removed by Filter: 0
Buffers: shared hit=46 read=3
I/O Timings: read=1.199
Planning:
Buffers: shared hit=416
Planning Time: 1.323 ms
Execution Time: 6.667 ms
(21 rows)

Picking random row from query with an exists where clause seem slow

I am trying to pick a random row from a table with what seems like a simple condition which includes it should be present in another table as well having a specific type.
select * from table1 t1 where type='Other' and exists (select 1 from table2 t2 where t2.id = t1.id FETCH FIRST ROW ONLY) order by random() limit 1;
I am trying to cut the selection in the second table short by only picking the first row it finds, I am really just interested if any row in that table exists, so one will do. However the time it took to run didn't change anything really. I've also tried joining the tables instead, that took twice as long it seems. Right now it takes a little over a minute to run, I am trying to cut this down to a few seconds, max 10.
Any ideas? The type could change in the query, it needs to be a generic index, not a specific one for "Other", there are hundreds of types.
Table1 has over +10M rows with unique id's, table2 has +95M.
I have an index on the id's as well as type
CREATE INDEX type_idx ON table1 USING btree (type);
CREATE INDEX id_idx ON table2 USING btree (id);
CREATE UNIQUE INDEX table1_pkey ON table1 USING btree (id);
Here is the explain
Limit (cost=297536.80..297536.80 rows=1 width=51) (actual time=68436.446..68456.389
rows=1 loops=1)
Buffers: shared hit=1503764 read=299217
I/O Timings: read=199577.751
-> Sort (cost=297536.80..297586.31 rows=19807 width=51) (actual
time=68436.444..68456.386 rows=1 loops=1)
Sort Key: (random())
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1503764 read=299217
I/O Timings: read=199577.751
-> Gather (cost=7051.90..297437.76 rows=19807 width=51) (actual
time=117.271..68418.453 rows=58327 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1503764 read=299217
I/O Timings: read=199577.751
-> Nested Loop Semi Join (cost=6051.90..294407.54 rows=8253 width=43)
(actual time=84.291..68358.619 rows=19442 loops=3)
Buffers: shared hit=1503764 read=299217
I/O Timings: read=199577.751
-> Parallel Bitmap Heap Scan on table1 t1 (cost=6051.46..135601.49
rows=225539 width=43) (actual time=83.250..24802.725 rows=185267 loops=3)
Recheck Cond: ((type)::text = 'Other'::text)
Rows Removed by Index Recheck: 1119917
Heap Blocks: exact=20174 lossy=11038
Buffers: shared read=94319
I/O Timings: read=72301.594
-> Bitmap Index Scan on type_idx (cost=0.00..5916.13
rows=541293 width=0) (actual time=89.207..89.208 rows=555802 loops=1)
Index Cond: ((type)::text = 'Other'::text)
Buffers: shared read=470
I/O Timings: read=33.209
-> Index Only Scan using id_idx on events (cost=0.44..65.15
rows=257 width=8) (actual time=0.234..0.234 rows=0 loops=555802)
Index Cond: (t2.id = t1.id)
Heap Fetches: 461
Buffers: shared hit=1503764 read=204898
I/O Timings: read=127276.157
Planning:
Buffers: shared hit=8 read=8
I/O Timings: read=3.139
Planning Time: 5.713 ms
Execution Time: 68457.688 ms
Here is the explain plan after I changed the type index to also include the id
Limit (cost=305876.92..305876.92 rows=1 width=51) (actual
time=81055.897..81077.393 rows=1 loops=1)
Buffers: shared hit=1501397 read=303247
I/O Timings: read=237093.600
-> Sort (cost=305876.92..305926.44 rows=19807 width=51) (actual
time=81055.895..81077.390 rows=1 loops=1)
Sort Key: (random())
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1501397 read=303247
I/O Timings: read=237093.600
-> Gather (cost=15392.02..305777.89 rows=19807 width=51)
(actual time=87.662..81032.107 rows=58327 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1501397 read=303247
I/O Timings: read=237093.600
-> Nested Loop Semi Join (cost=14392.02..302747.67
rows=8253 width=43) (actual time=73.967..80990.425 rows=19442
loops=3)
Buffers: shared hit=1501397 read=303247
I/O Timings: read=237093.600
-> Parallel Bitmap Heap Scan on table1 t1
(cost=14391.58..143941.61 rows=225539 width=43) (actual
time=73.193..20476.307 rows=185267 loops=3)
Recheck Cond: ((type)::text =
'Other'::text)
Rows Removed by Index Recheck: 1124091
Heap Blocks: exact=20346 lossy=11134
Buffers: shared read=95982
I/O Timings: read=59211.444
-> Bitmap Index Scan on type_idx
(cost=0.00..14256.26 rows=541293 width=0) (actual
time=73.552..73.552 rows=555802 loops=1)
Index Cond: ((type)::text =
'Other'::text)
Buffers: shared read=2133
I/O Timings: read=6.812
-> Index Only Scan using id_idx on table2
(cost=0.44..65.15 rows=257 width=8) (actual time=0.326..0.326
rows=0 loops=555802)
Index Cond: (t2.id = t1.id)
Heap Fetches: 461
Buffers: shared hit=1501397 read=207265
I/O Timings: read=177882.156
Planning:
Buffers: shared hit=29 read=10
I/O Timings: read=4.789
Planning Time: 11.993 ms
Execution Time: 81078.404 ms
Could you try a combination of columns in your index:
CREATE INDEX type_id_idx ON table1 USING btree (type, id);
How does this change the query plan?

SELECT query waiting on DataFileWrite in Postgres

I have a partitioned table on which I run both queries and INSERTs simultaneously from multiple sessions. Database is Postgres 14.1.
Running select wait_event_type, wait_event, state, query from pg_stat_activity often reports my queries (plain SELECT on an indexed column) are waiting on DataFileWrite wait_event, type IO.
I don't understand, why would a query wait on file write?
Limit (cost=210.00..210.00 rows=1 width=14) (actual time=3.279..3.281 rows=0 loops=1)
Buffers: shared hit=27 read=7
I/O Timings: read=3.044
-> Sort (cost=210.00..210.11 rows=43 width=14) (actual time=3.278..3.280 rows=0 loops=1)
Sort Key: mldata.sinceepoch DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=27 read=7
I/O Timings: read=3.044
-> Append (cost=0.56..209.79 rows=43 width=14) (actual time=3.274..3.276 rows=0 loops=1)
Buffers: shared hit=27 read=7
I/O Timings: read=3.044
-> Index Scan using mldata_313_destination_idx on mldata_313 mldata_1 (cost=0.56..20.67 rows=4 width=14) (actual time=0.056..0.056 rows=0 loops=1)
Index Cond: (destination = 154328765)
Filter: (day = ANY ('{313,314,315,316,317,318,319,320}'::integer[]))
Buffers: shared hit=6 read=1
I/O Timings: read=0.010
....... a few more of IndexScan .........
Planning:
Buffers: shared hit=1340 read=136 written=11
I/O Timings: read=0.910 **write=0.138**
Planning Time: 3.920 ms
Execution Time: 3.338 ms

Append Cost very high on Partitioned table

I have a query joining two tables partitioned on timestamp column. Both tables are filtered on current date partition. But query is unusually slow with APPEND Cost of the driving table very high.
Query and Plan : https://explain.dalibo.com/plan/wVA
Nested Loop (cost=0.56..174042.82 rows=16 width=494) (actual time=0.482..20.133 rows=1713 loops=1)
Output: tran.transaction_status, mgwor.apx_transaction_id, org.organisation_name, mgwor.order_status, mgwor.request_date, mgwor.response_date, (date_part('epoch'::text, mgwor.response_date) - date_part('epoch'::text, mgwor.request_date))
Buffers: shared hit=5787 dirtied=3
-> Nested Loop (cost=0.42..166837.32 rows=16 width=337) (actual time=0.459..7.803 rows=1713 loops=1)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, org.organisation_name
Join Filter: ((account.account_id)::text = (mgwor.account_id)::text)
Rows Removed by Join Filter: 3007
Buffers: shared hit=589
-> Nested Loop (cost=0.27..40.66 rows=4 width=54) (actual time=0.203..0.483 rows=2 loops=1)
Output: account.account_id, org.organisation_name
Join Filter: ((account.organisation_id)::text = (org.organisation_id)::text)
Rows Removed by Join Filter: 289
Buffers: shared hit=27
-> Index Scan using account_pkey on mdm.account (cost=0.27..32.55 rows=285 width=65) (actual time=0.013..0.122 rows=291 loops=1)
Output: account.account_id, account.account_created_at, account.account_name, account.account_status, account.account_valid_until, account.currency_id, account.organisation_id, account.organisation_psp_id, account."account_threeDS_required", account.account_use_webhook, account.account_webhook_url, account.account_webhook_max_attempt, account.reporting_account_id, account.card_type, account.country_id, account.product_id
Buffers: shared hit=24
-> Materialize (cost=0.00..3.84 rows=1 width=55) (actual time=0.000..0.000 rows=1 loops=291)
Output: org.organisation_name, org.organisation_id
Buffers: shared hit=3
-> Seq Scan on mdm.organisation_smd org (cost=0.00..3.84 rows=1 width=55) (actual time=0.017..0.023 rows=1 loops=1)
Output: org.organisation_name, org.organisation_id
Filter: ((org.organisation_name)::text = 'ABC'::text)
Rows Removed by Filter: 67
Buffers: shared hit=3
-> Materialize (cost=0.15..166576.15 rows=3835 width=473) (actual time=0.127..2.826 rows=2360 loops=2)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id
Buffers: shared hit=562
-> Append (cost=0.15..166556.97 rows=3835 width=473) (actual time=0.252..3.661 rows=2360 loops=1)
Buffers: shared hit=562
Subplans Removed: 1460
-> Bitmap Heap Scan on public.mgworderrequest_part_20200612 mgwor (cost=50.98..672.23 rows=2375 width=91) (actual time=0.251..2.726 rows=2360 loops=1)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id
Recheck Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[]))
Filter: ((mgwor.request_date >= date(now())) AND (mgwor.request_date < (date(now()) + 1)))
Heap Blocks: exact=549
Buffers: shared hit=562
-> Bitmap Index Scan on mgworderrequest_part_20200612_request_type_idx (cost=0.00..50.38 rows=2375 width=0) (actual time=0.191..0.192 rows=2361 loops=1)
Index Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[]))
Buffers: shared hit=13
-> Append (cost=0.14..435.73 rows=1461 width=316) (actual time=0.005..0.006 rows=1 loops=1713)
Buffers: shared hit=5198 dirtied=3
Subplans Removed: 1460
-> Index Scan using transaction_part_20200612_pkey on public.transaction_part_20200612 tran (cost=0.29..0.87 rows=1 width=42) (actual time=0.004..0.005 rows=1 loops=1713)
Output: tran.transaction_status, tran.transaction_id
Index Cond: (((tran.transaction_id)::text = (mgwor.apx_transaction_id)::text) AND (tran.transaction_created_at >= date(now())) AND (tran.transaction_created_at < (date(now()) + 1)))
Filter: (tran.transaction_status IS NOT NULL)
Buffers: shared hit=5198 dirtied=3
Planning Time: 19535.308 ms
Execution Time: 21.006 ms
Partition pruning is working on both the tables.
Am I missing something obvious here?
Thanks,
VA
I don't know why the cost estimate for the append is so large, but presumably you are really worried about how long this takes, not how large the estimate is. As noted, the actual time is going to planning, not to execution.
A likely explanation is that it was waiting on a lock. Time spent waiting on a table lock for a partition table (but not for the parent table) gets attributed to planning time.

Search queries slow on Materialized Views of a restored dump as compared to the original DB

I have a psql DB containing various Materialized Views, on running a query, i.e., query_a we complete the query execution in 2800ms and re-running the same query again we get an execution time of 53ms. This can be explained by the caching done by psql. Now comes the tricky part, I create a dump of this db and restore it in NewDB, when I re-run query_a I get an execution time of 2253ms and on re-running get the same time, i.e., it seems that the psql caching is not working on the NewDB.
I conducted various experiments to rectify the same and noticed that there is no improvement when I explicitly refresh the views but if I drop these views and re create it in my NewDB, it gives me the original performance.
Note that the data is constant in DB and NewDB and I have used the commands mentioned here for dump creation and restore.
The result for re running the query on DB is ->
The results for running the same query on NewDB for 1st and 2nd time are as follows ->
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=113790614477.61..113790614477.62 rows=1 width=8) (actual time=2284.605..2284.605 rows=1 loops=1)
Buffers: shared hit=3540872
CTE t
-> Merge Join (cost=40600.92..11846650.56 rows=763041594 width=425) (actual time=3.693..1909.916 rows=6005 loops=1)
Merge Cond: (n.node_id = nd.node_id)
Buffers: shared hit=3524063
-> Index Scan using nodes_node_id on nodes n (cost=0.43..350865.91 rows=3824099 width=389) (actual time=0.014..1651.025 rows=3598491 loops=1)
Buffers: shared hit=3523372
-> Sort (cost=40600.49..40700.26 rows=39907 width=40) (actual time=3.668..4.227 rows=6005 loops=1)
Sort Key: nd.node_id
Sort Method: quicksort Memory: 623kB
Buffers: shared hit=691
-> Bitmap Heap Scan on nodes_depths nd (cost=1153.11..37550.73 rows=39907 width=40) (actual time=0.627..2.846 rows=6005 loops=1)
Recheck Cond: ((ancestor_1 = 1) OR (ancestor_2 = 1))
Heap Blocks: exact=658
Buffers: shared hit=691
-> BitmapOr (cost=1153.11..1153.11 rows=40007 width=0) (actual time=0.547..0.547 rows=0 loops=1)
Buffers: shared hit=33
-> Bitmap Index Scan on nodes_depths_1 (cost=0.00..566.58 rows=20003 width=0) (actual time=0.032..0.032 rows=156 loops=1)
Index Cond: (ancestor_1 = 1)
Buffers: shared hit=4
-> Bitmap Index Scan on nodes_depths_2 (cost=0.00..566.58 rows=20003 width=0) (actual time=0.515..0.515 rows=5849 loops=1)
Index Cond: (ancestor_2 = 1)
Buffers: shared hit=29
-> Merge Right Join (cost=169565733.26..97549168801.28 rows=6491839610305 width=0) (actual time=1915.721..2284.175 rows=6005 loops=1)
Merge Cond: (nodes_fts.node_id = t.node_id)
Buffers: shared hit=3540872
-> Index Only Scan using nodes_fts_idx on nodes_fts (cost=0.43..97055.96 rows=1701569 width=4) (actual time=0.041..277.890 rows=1598712 loops=1)
Heap Fetches: 1598712
Buffers: shared hit=16805
-> Materialize (cost=169565732.84..173380940.81 rows=763041594 width=4) (actual time=1915.675..1916.583 rows=6005 loops=1)
Buffers: shared hit=3524067
-> Sort (cost=169565732.84..171473336.82 rows=763041594 width=4) (actual time=1915.672..1916.057 rows=6005 loops=1)
Sort Key: t.node_id
Sort Method: quicksort Memory: 474kB
Buffers: shared hit=3524067
-> CTE Scan on t (cost=0.00..15260831.88 rows=763041594 width=4) (actual time=3.698..1914.771 rows=6005 loops=1)
Buffers: shared hit=3524063
Planning time: 68.064 ms
Execution time: 2285.084 ms
(40 rows)
and for the second run ->
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=113790614477.61..113790614477.62 rows=1 width=8) (actual time=2295.319..2295.319 rows=1 loops=1)
Buffers: shared hit=3540868
CTE t
-> Merge Join (cost=40600.92..11846650.56 rows=763041594 width=425) (actual time=15.324..1926.744 rows=6005 loops=1)
Merge Cond: (n.node_id = nd.node_id)
Buffers: shared hit=3524063
-> Index Scan using nodes_node_id on nodes n (cost=0.43..350865.91 rows=3824099 width=389) (actual time=0.027..1648.277 rows=3598491 loops=1)
Buffers: shared hit=3523372
-> Sort (cost=40600.49..40700.26 rows=39907 width=40) (actual time=15.254..15.903 rows=6005 loops=1)
Sort Key: nd.node_id
Sort Method: quicksort Memory: 623kB
Buffers: shared hit=691
-> Bitmap Heap Scan on nodes_depths nd (cost=1153.11..37550.73 rows=39907 width=40) (actual time=3.076..10.752 rows=6005 loops=1)
Recheck Cond: ((ancestor_1 = 1) OR (ancestor_2 = 1))
Heap Blocks: exact=658
Buffers: shared hit=691
-> BitmapOr (cost=1153.11..1153.11 rows=40007 width=0) (actual time=2.524..2.525 rows=0 loops=1)
Buffers: shared hit=33
-> Bitmap Index Scan on nodes_depths_1 (cost=0.00..566.58 rows=20003 width=0) (actual time=0.088..0.088 rows=156 loops=1)
Index Cond: (ancestor_1 = 1)
Buffers: shared hit=4
-> Bitmap Index Scan on nodes_depths_2 (cost=0.00..566.58 rows=20003 width=0) (actual time=2.434..2.435 rows=5849 loops=1)
Index Cond: (ancestor_2 = 1)
Buffers: shared hit=29
-> Merge Right Join (cost=169565733.26..97549168801.28 rows=6491839610305 width=0) (actual time=1933.113..2294.894 rows=6005 loops=1)
Merge Cond: (nodes_fts.node_id = t.node_id)
Buffers: shared hit=3540868
-> Index Only Scan using nodes_fts_idx on nodes_fts (cost=0.43..97055.96 rows=1701569 width=4) (actual time=0.077..271.313 rows=1598712 loops=1)
Heap Fetches: 1598712
Buffers: shared hit=16805
-> Materialize (cost=169565732.84..173380940.81 rows=763041594 width=4) (actual time=1933.030..1933.903 rows=6005 loops=1)
Buffers: shared hit=3524063
-> Sort (cost=169565732.84..171473336.82 rows=763041594 width=4) (actual time=1933.026..1933.375 rows=6005 loops=1)
Sort Key: t.node_id
Sort Method: quicksort Memory: 474kB
Buffers: shared hit=3524063
-> CTE Scan on t (cost=0.00..15260831.88 rows=763041594 width=4) (actual time=15.336..1932.145 rows=6005 loops=1)
Buffers: shared hit=3524063
Planning time: 1.154 ms
Execution time: 2295.801 ms
(40 rows)
The estimated number of rows is off from the actual numbers by orders of magnitude:
CTE Scan on t (cost=0.00..15260831.88 rows=763041594 width=4)
(actual time=15.336..1932.145 rows=6005 loops=1)
When Postgres can't make accurate estimates of how much work a particular way of executing your query is compared to another it will generate inefficient query plans and that is why the same query can be slow even if all the data is in RAM.
When you backup a table the dump does not contain the statistics used by the optimizer so you need to wait for the autovacuum daemon or run 'ANALYZE ' manually after restoring from the dump.