Postgresql Version: 13.6
Server size: 2cx4g
Logs of Postgresql shows duration 6215.926ms, but explain shows that the execution time and planning time are less than 200ms
Is slow reading speed of the client a cause of the problem?
2022-04-18 16:28:20.491 CST [1159] LOG: duration: 6215.926 ms statement: EXPLAIN (ANALYZE ON, BUFFERS ON) select 1 from information_schema.tables where table_schema = 'pg_catalog' and table_name = 'pg_file_settings'
Nested Loop (cost=0.68..17.88 rows=1 width=4) (actual time=0.126..0.128 rows=1 loops=1)
Join Filter: (c.relnamespace = nc.oid)
Buffers: shared hit=8 read=2
I/O Timings: read=0.027
-> Nested Loop Left Join (cost=0.68..16.81 rows=1 width=4) (actual time=0.085..0.087 rows=1 loops=1)
Buffers: shared hit=3 read=2
I/O Timings: read=0.027
-> Index Scan using pg_class_relname_nsp_index on pg_class c (cost=0.27..8.33 rows=1 width=8) (actual time=0.065..0.066 rows=1 loops=1)
Index Cond: (relname = 'pg_file_settings'::name)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
Buffers: shared hit=1 read=2
I/O Timings: read=0.027
-> Nested Loop (cost=0.40..8.47 rows=1 width=4) (actual time=0.017..0.018 rows=0 loops=1)
Buffers: shared hit=2
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.27..8.29 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
Index Cond: (oid = c.reloftype)
Buffers: shared hit=2
-> Index Only Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.13..0.17 rows=1 width=4) (never executed)
Index Cond: (oid = t.typnamespace)
Heap Fetches: 0
-> Seq Scan on pg_namespace nc (cost=0.00..1.06 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=1)
Filter: ((NOT pg_is_other_temp_schema(oid)) AND (nspname = 'pg_catalog'::name))
Rows Removed by Filter: 1
Buffers: shared hit=5
Planning:
Buffers: shared hit=168 read=52 written=5
I/O Timings: read=103.513 write=43.230
Planning Time: 184.492 ms
Execution Time: 0.438 ms
Edit:
After running analyze on the database, same results:
2022-04-18 17:53:04.563 CST [16398] LOG: duration: 5987.077 ms statement: EXPLAIN (ANALYZE ON, BUFFERS ON) select 1 from information_schema.tables where table_schema = 'pg_catalog' and table_name = 'pg_file_settings'
Nested Loop (cost=0.68..17.88 rows=1 width=4) (actual time=0.125..0.127 rows=1 loops=1)
Join Filter: (c.relnamespace = nc.oid)
Buffers: shared hit=8 read=2
I/O Timings: read=0.025
-> Nested Loop Left Join (cost=0.68..16.81 rows=1 width=4) (actual time=0.088..0.090 rows=1 loops=1)
Buffers: shared hit=3 read=2
I/O Timings: read=0.025
-> Index Scan using pg_class_relname_nsp_index on pg_class c (cost=0.27..8.33 rows=1 width=8) (actual time=0.069..0.070 rows=1 loops=1)
Index Cond: (relname = 'pg_file_settings'::name)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
Buffers: shared hit=1 read=2
I/O Timings: read=0.025
-> Nested Loop (cost=0.40..8.47 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1)
Buffers: shared hit=2
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.27..8.29 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: (oid = c.reloftype)
Buffers: shared hit=2
-> Index Only Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.13..0.17 rows=1 width=4) (never executed)
Index Cond: (oid = t.typnamespace)
Heap Fetches: 0
-> Seq Scan on pg_namespace nc (cost=0.00..1.06 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1)
Filter: ((NOT pg_is_other_temp_schema(oid)) AND (nspname = 'pg_catalog'::name))
Rows Removed by Filter: 1
Buffers: shared hit=5
Planning:
Buffers: shared hit=170 read=59 written=9
I/O Timings: read=105.688 write=79.773
Planning Time: 262.826 ms
Execution Time: 0.366 ms
Edit:
Analyze output after increasing shared_buffers to 25% of memory
Nested Loop (cost=0.68..17.88 rows=1 width=4) (actual time=0.096..0.098 rows=1 loops=1)
Join Filter: (c.relnamespace = nc.oid)
Buffers: shared hit=10
-> Nested Loop Left Join (cost=0.68..16.81 rows=1 width=4) (actual time=0.057..0.059 rows=1 loops=1)
Buffers: shared hit=5
-> Index Scan using pg_class_relname_nsp_index on pg_class c (cost=0.27..8.33 rows=1 width=8) (actual time=0.040..0.042 rows=1 loops=1)
Index Cond: (relname = 'pg_file_settings'::name)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
Buffers: shared hit=3
-> Nested Loop (cost=0.40..8.47 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
Buffers: shared hit=2
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.27..8.29 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=1)
Index Cond: (oid = c.reloftype)
Buffers: shared hit=2
-> Index Only Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.13..0.17 rows=1 width=4) (never executed)
Index Cond: (oid = t.typnamespace)
Heap Fetches: 0
-> Seq Scan on pg_namespace nc (cost=0.00..1.06 rows=1 width=4) (actual time=0.036..0.036 rows=1 loops=1)
Filter: ((NOT pg_is_other_temp_schema(oid)) AND (nspname = 'pg_catalog'::name))
Rows Removed by Filter: 1
Buffers: shared hit=5
Planning:
Buffers: shared hit=229
Planning Time: 7.866 ms
Execution Time: 0.306 ms
We are having some struggle identifying why Postgres is using too much batches to resolve a join.
Here it is the output of explain analyze of a problematic execution:
https://explain.dalibo.com/plan/xNJ#plan
Limit (cost=20880.87..20882.91 rows=48 width=205) (actual time=10722.953..10723.358 rows=48 loops=1)
-> Unique (cost=20880.87..21718.12 rows=19700 width=205) (actual time=10722.951..10723.356 rows=48 loops=1)
-> Sort (cost=20880.87..20930.12 rows=19700 width=205) (actual time=10722.950..10722.990 rows=312 loops=1)
Sort Key: titlemetadata_titlemetadata.creation_date DESC, titlemetadata_titlemetadata.id, titlemetadata_titlemetadata.title_type, titlemetadata_titlemetadata.original_title, titlemetadata_titlemetadata.alternative_ids, titlemetadata_titlemetadata.metadata,
titlemetadata_titlemetadata.is_adult, titlemetadata_titlemetadata.is_kids, titlemetadata_titlemetadata.last_modified, titlemetadata_titlemetadata.year, titlemetadata_titlemetadata.runtime, titlemetadata_titlemetadata.rating, titlemetadata_titlemetadata.video_provider, tit
lemetadata_titlemetadata.series_id_id, titlemetadata_titlemetadata.season_number, titlemetadata_titlemetadata.episode_number
Sort Method: quicksort Memory: 872kB
-> Hash Right Join (cost=13378.20..19475.68 rows=19700 width=205) (actual time=1926.352..10709.970 rows=2909 loops=1)
Hash Cond: (t4.titlemetadata_id = t3.id)
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
Rows Removed by Filter: 63248
-> Seq Scan on video_provider_offer t4 (cost=0.00..5454.90 rows=66290 width=16) (actual time=0.024..57.893 rows=66390 loops=1)
-> Hash (cost=11314.39..11314.39 rows=22996 width=221) (actual time=489.530..489.530 rows=60096 loops=1)
Buckets: 65536 (originally 32768) Batches: 32768 (originally 1) Memory Usage: 11656kB
-> Hash Right Join (cost=5380.95..11314.39 rows=22996 width=221) (actual time=130.024..225.271 rows=60096 loops=1)
Hash Cond: (video_provider_offer.titlemetadata_id = titlemetadata_titlemetadata.id)
-> Seq Scan on video_provider_offer (cost=0.00..5454.90 rows=66290 width=16) (actual time=0.011..32.950 rows=66390 loops=1)
-> Hash (cost=5129.28..5129.28 rows=20133 width=213) (actual time=129.897..129.897 rows=55793 loops=1)
Buckets: 65536 (originally 32768) Batches: 2 (originally 1) Memory Usage: 7877kB
-> Merge Left Join (cost=1.72..5129.28 rows=20133 width=213) (actual time=0.041..93.057 rows=55793 loops=1)
Merge Cond: (titlemetadata_titlemetadata.id = t3.series_id_id)
-> Index Scan using titlemetadata_titlemetadata_pkey on titlemetadata_titlemetadata (cost=1.30..4130.22 rows=20133 width=205) (actual time=0.028..62.949 rows=43921 loops=1)
Filter: ((NOT is_adult) AND (NOT (hashed SubPlan 3)) AND (((title_type)::text = 'MOV'::text) OR ((title_type)::text = 'TVS'::text) OR ((title_type)::text = 'TVP'::text) OR ((title_type)::text = 'EVT'::text)))
Rows Removed by Filter: 14121
SubPlan 3
-> Seq Scan on cable_operator_cableoperatorexcludedtitle u0_2 (cost=0.00..1.01 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
Filter: (cable_operator_id = 54)
-> Index Scan using titlemetadata_titlemetadata_series_id_id_73453db4_uniq on titlemetadata_titlemetadata t3 (cost=0.41..3901.36 rows=58037 width=16) (actual time=0.011..9.375 rows=12887 loops=1)
SubPlan 1
-> Hash Join (cost=44.62..885.73 rows=981 width=8) (actual time=0.486..36.806 rows=5757 loops=1)
Hash Cond: (w2.device_id = w3.id)
-> Nested Loop (cost=43.49..866.20 rows=2289 width=16) (actual time=0.441..33.096 rows=20180 loops=1)
-> Nested Loop (cost=43.06..414.98 rows=521 width=8) (actual time=0.426..9.952 rows=2909 loops=1)
Join Filter: (w1.id = w0.video_provider_id)
-> Nested Loop (cost=42.65..54.77 rows=13 width=24) (actual time=0.399..0.532 rows=15 loops=1)
-> HashAggregate (cost=42.50..42.95 rows=45 width=16) (actual time=0.390..0.403 rows=45 loops=1)
Group Key: v0.id
-> Nested Loop (cost=13.34..42.39 rows=45 width=16) (actual time=0.095..0.364 rows=45 loops=1)
-> Hash Semi Join (cost=13.19..32.72 rows=45 width=8) (actual time=0.084..0.229 rows=45 loops=1)
Hash Cond: (v1.id = u0.id)
-> Seq Scan on cable_operator_cableoperatorprovider v1 (cost=0.00..17.36 rows=636 width=16) (actual time=0.010..0.077 rows=636 loops=1)
-> Hash (cost=12.63..12.63 rows=45 width=8) (actual time=0.046..0.046 rows=45 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Index Scan using cable_operator_cableoperatorprovider_4d6e54b3 on cable_operator_cableoperatorprovider u0 (cost=0.28..12.63 rows=45 width=8) (actual time=0.016..0.035 rows=45 loops=1)
Index Cond: (cable_operator_id = 54)
-> Index Only Scan using video_provider_videoprovider_pkey on video_provider_videoprovider v0 (cost=0.15..0.20 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=45)
Index Cond: (id = v1.provider_id)
Heap Fetches: 45
-> Index Scan using video_provider_videoprovider_pkey on video_provider_videoprovider w1 (cost=0.15..0.25 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=45)
Index Cond: (id = v0.id)
Filter: ((video_provider_type)::text = 'VOD'::text)
Rows Removed by Filter: 1
-> Index Scan using video_provider_offer_da942d2e on video_provider_offer w0 (cost=0.42..27.22 rows=39 width=16) (actual time=0.026..0.585 rows=194 loops=15)
Index Cond: (video_provider_id = v0.id)
Filter: (((end_date > '2021-09-02 19:23:00-03'::timestamp with time zone) OR (end_date IS NULL)) AND (access_criteria && '{vtv_mas,TBX_LOGIN,urn:spkg:tve:fox-premium,urn:tve:mcp,AMCHD,AMC_CONSORCIO,ANIMAL_PLANET,ASUNTOS_PUBLI
COS,ASUNTOS_PUBLICOS_CONSORCIO,CINECANALLIVE,CINECANAL_CONSORCIO,DISCOVERY,DISCOVERY_KIDS_CONSORCIO,DISCOVERY_KIDS_OD,DISNEY,DISNEY_CH_CONSORCIO,DISNEY_XD,DISNEY_XD_CONSORCIO,EL_CANAL_HD,EL_CANAL_HD_CONSORCIO,EL_GOURMET_CONSORCIO,ESPN,ESPN2_HD_CONSORCIO,ESPN3_HD_CONSORCIO
,ESPNMAS_HD_CONSORCIO,ESPN_BASIC,ESPN_HD_CONSORCIO,ESPN_PLAY,EUROPALIVE,EUROPA_EUROPA,EUROPA_EUROPA_CONSORCIO,FILMANDARTS_DISPOSITIVOS,FILMS_ARTS,FILM_AND_ARTS_CONSORCIO,FOXLIFE,FOX_LIFE_CONSORCIO,FOX_SPORTS_1_DISPOSITIVOS,FOX_SPORTS_2_DISPOSITIVOS,FOX_SPORTS_2_HD_CONSORC
IO,FOX_SPORTS_3_DISPOSITIVOS,FOX_SPORTS_3_HD_CONSORCIO,FOX_SPORTS_HD_CONSORCIO,FRANCE24_DISPOSITIVOS,FRANCE_24_CONSORCIO,GOURMET,GOURMET_DISPOSITIVOS,HOME_HEALTH,INVESTIGATION_DISCOVERY,MAS_CHIC,NATGEOKIDS_DISPOSITIVOS,NATGEO_CONSORCIO,NATGEO_DISPOSITIVOS,NATGEO_KIDS_CONS
ORCIO,PASIONES,PASIONES_CONSORCIO,SVOD_TYC_BASIC,TBX_LOGIN,TCC_2_CONSORCIO,TCC_2_HD,TLC,TVE,TVE_CONSORCIO,TYC_SPORTS_CONSORCIO,VTV_LIVE,clarosports,discoverykids,espnplay_south_alt,urn:spkg:tve:fox-basic,urn:tve:babytv,urn:tve:cinecanal,urn:tve:discoverykids,urn:tve:foxli
fe,urn:tve:fp,urn:tve:fx,urn:tve:natgeo,urn:tve:natgeokids,urn:tve:natgeowild,urn:tve:thefilmzone}'::character varying(50)[]) AND ((((content_type)::text = 'VOD'::text) AND ((start_date < '2021-09-02 19:23:00-03'::timestamp with time zone) OR (start_date IS NULL))) OR ((c
ontent_type)::text = 'LIV'::text)))
Rows Removed by Filter: 5
-> Index Only Scan using video_provider_offer_devices_offer_id_device_id_key on video_provider_offer_devices w2 (cost=0.42..0.81 rows=6 width=16) (actual time=0.004..0.007 rows=7 loops=2909)
Index Cond: (offer_id = w0.id)
Heap Fetches: 17828
-> Hash (cost=1.10..1.10 rows=3 width=8) (actual time=0.029..0.029 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on platform_device_device w3 (cost=0.00..1.10 rows=3 width=8) (actual time=0.024..0.027 rows=2 loops=1)
Filter: ((device_code)::text = ANY ('{ANDROID,ott_dual_tcc,ott_k2_tcc}'::text[]))
Rows Removed by Filter: 5
SubPlan 2
-> Hash Join (cost=44.62..885.73 rows=981 width=8) (actual time=0.410..33.580 rows=5757 loops=1)
Hash Cond: (w2_1.device_id = w3_1.id)
-> Nested Loop (cost=43.49..866.20 rows=2289 width=16) (actual time=0.375..29.886 rows=20180 loops=1)
-> Nested Loop (cost=43.06..414.98 rows=521 width=8) (actual time=0.366..9.134 rows=2909 loops=1)
Join Filter: (w1_1.id = w0_1.video_provider_id)
-> Nested Loop (cost=42.65..54.77 rows=13 width=24) (actual time=0.343..0.476 rows=15 loops=1)
-> HashAggregate (cost=42.50..42.95 rows=45 width=16) (actual time=0.333..0.347 rows=45 loops=1)
Group Key: v0_1.id
-> Nested Loop (cost=13.34..42.39 rows=45 width=16) (actual time=0.083..0.311 rows=45 loops=1)
-> Hash Semi Join (cost=13.19..32.72 rows=45 width=8) (actual time=0.076..0.202 rows=45 loops=1)
Hash Cond: (v1_1.id = u0_1.id)
-> Seq Scan on cable_operator_cableoperatorprovider v1_1 (cost=0.00..17.36 rows=636 width=16) (actual time=0.005..0.057 rows=636 loops=1)
-> Hash (cost=12.63..12.63 rows=45 width=8) (actual time=0.038..0.038 rows=45 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Index Scan using cable_operator_cableoperatorprovider_4d6e54b3 on cable_operator_cableoperatorprovider u0_1 (cost=0.28..12.63 rows=45 width=8) (actual time=0.007..0.020 rows=45 loops=1)
Index Cond: (cable_operator_id = 54)
-> Index Only Scan using video_provider_videoprovider_pkey on video_provider_videoprovider v0_1 (cost=0.15..0.20 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=45)
Index Cond: (id = v1_1.provider_id)
Heap Fetches: 45
-> Index Scan using video_provider_videoprovider_pkey on video_provider_videoprovider w1_1 (cost=0.15..0.25 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=45)
Index Cond: (id = v0_1.id)
Filter: ((video_provider_type)::text = 'VOD'::text)
Rows Removed by Filter: 1
-> Index Scan using video_provider_offer_da942d2e on video_provider_offer w0_1 (cost=0.42..27.22 rows=39 width=16) (actual time=0.022..0.536 rows=194 loops=15)
Index Cond: (video_provider_id = v0_1.id)
Filter: (((end_date > '2021-09-02 19:23:00-03'::timestamp with time zone) OR (end_date IS NULL)) AND (access_criteria && '{vtv_mas,TBX_LOGIN,urn:spkg:tve:fox-premium,urn:tve:mcp,AMCHD,AMC_CONSORCIO,ANIMAL_PLANET,ASUNTOS_PUBLI
COS,ASUNTOS_PUBLICOS_CONSORCIO,CINECANALLIVE,CINECANAL_CONSORCIO,DISCOVERY,DISCOVERY_KIDS_CONSORCIO,DISCOVERY_KIDS_OD,DISNEY,DISNEY_CH_CONSORCIO,DISNEY_XD,DISNEY_XD_CONSORCIO,EL_CANAL_HD,EL_CANAL_HD_CONSORCIO,EL_GOURMET_CONSORCIO,ESPN,ESPN2_HD_CONSORCIO,ESPN3_HD_CONSORCIO
,ESPNMAS_HD_CONSORCIO,ESPN_BASIC,ESPN_HD_CONSORCIO,ESPN_PLAY,EUROPALIVE,EUROPA_EUROPA,EUROPA_EUROPA_CONSORCIO,FILMANDARTS_DISPOSITIVOS,FILMS_ARTS,FILM_AND_ARTS_CONSORCIO,FOXLIFE,FOX_LIFE_CONSORCIO,FOX_SPORTS_1_DISPOSITIVOS,FOX_SPORTS_2_DISPOSITIVOS,FOX_SPORTS_2_HD_CONSORC
IO,FOX_SPORTS_3_DISPOSITIVOS,FOX_SPORTS_3_HD_CONSORCIO,FOX_SPORTS_HD_CONSORCIO,FRANCE24_DISPOSITIVOS,FRANCE_24_CONSORCIO,GOURMET,GOURMET_DISPOSITIVOS,HOME_HEALTH,INVESTIGATION_DISCOVERY,MAS_CHIC,NATGEOKIDS_DISPOSITIVOS,NATGEO_CONSORCIO,NATGEO_DISPOSITIVOS,NATGEO_KIDS_CONS
ORCIO,PASIONES,PASIONES_CONSORCIO,SVOD_TYC_BASIC,TBX_LOGIN,TCC_2_CONSORCIO,TCC_2_HD,TLC,TVE,TVE_CONSORCIO,TYC_SPORTS_CONSORCIO,VTV_LIVE,clarosports,discoverykids,espnplay_south_alt,urn:spkg:tve:fox-basic,urn:tve:babytv,urn:tve:cinecanal,urn:tve:discoverykids,urn:tve:foxli
fe,urn:tve:fp,urn:tve:fx,urn:tve:natgeo,urn:tve:natgeokids,urn:tve:natgeowild,urn:tve:thefilmzone}'::character varying(50)[]) AND ((((content_type)::text = 'VOD'::text) AND ((start_date < '2021-09-02 19:23:00-03'::timestamp with time zone) OR (start_date IS NULL))) OR ((c
ontent_type)::text = 'LIV'::text)))
Rows Removed by Filter: 5
-> Index Only Scan using video_provider_offer_devices_offer_id_device_id_key on video_provider_offer_devices w2_1 (cost=0.42..0.81 rows=6 width=16) (actual time=0.003..0.006 rows=7 loops=2909)
Index Cond: (offer_id = w0_1.id)
Heap Fetches: 17828
-> Hash (cost=1.10..1.10 rows=3 width=8) (actual time=0.015..0.015 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on platform_device_device w3_1 (cost=0.00..1.10 rows=3 width=8) (actual time=0.010..0.011 rows=2 loops=1)
Filter: ((device_code)::text = ANY ('{ANDROID,ott_dual_tcc,ott_k2_tcc}'::text[]))
Rows Removed by Filter: 5
Planning time: 8.255 ms
Execution time: 10723.830 ms
(100 rows)
The weird part is that the same query, sometimes just uses a single batch. Here is an example: https://explain.dalibo.com/plan/zTv#plan
Here is the work_mem being used:
show work_mem;
work_mem
----------
8388kB
(1 row)
I'm not interested in changing the query to be more performant, but in understanding why is the different behavior.
I've found this thread apparently related with this, but I don't quite understand what are they talking about: https://www.postgresql.org/message-id/flat/CA%2BhUKGKWWmf%3DWELLG%3DaUGbcugRaSQbtm0tKYiBut-B2rVKX63g%40mail.gmail.com
Can anyone tell me why is this different behavior? The underlying data is the same in both cases.
If the hash is done in memory, there will only be a single batch.
A difference with the original hash batch numbers is due to Postgres choosing to increase the number of batches in order to reduce memory consumption.
You might find this EXPLAIN glossary useful (disclaimer: I'm one of the authors), here is the page on Hash Batches which also links to the PostgreSQL source code (it's very nicely documented in plain English).
While not a perfect heuristic, you can see that the memory required for the operations with multiple batches are around or above your work_mem setting. They can be lower than it, due to operations on disk generally requiring less memory overall.
I'm not 100% sure why in your exact case one was chosen over the other, but it does look like there are some very slight row estimate differences, which might be a good place to start.
As of PostgreSQL 13 there is also now a hash_mem_multiplier setting that can be used to give more memory to hashes without doing so for other operations (like sorts).
We where able to solve the problem just by doing VACUUM FULL ANALYZE;.
After that, everything started to work as expected (https://explain.depesz.com/s/eoqH#html)
Side note: we where not aware that we should do this on daily basis.
I have the following two tables.
person_addresses
address_normalization
The person_addresses table has a field named address_id as the primary key and address_normalization has the corresponding field address_id which has an index on it.
Now, when I explain the following query, I see a sequential scan.
SELECT
count(*)
FROM
mp_member2.person_addresses pa
JOIN mp_member2.address_normalization an ON
an.address_id = pa.address_id
WHERE
an.sr_modification_time >= 1550692189468;
-- Result: 2654
Please refer to the following screenshot.
You see that there is a sequential scan after the hash join. I'm not sure I understand this part; why would a sequential scan follow a hash join.
And as seen in the query above, the set of records returned is also low.
Is this expected behaviour or am I doing something wrong?
Update #1: I also have indices on the sr_modification_time fields of both the tables
Update #2: Full execution plan
Aggregate (cost=206944.74..206944.75 rows=1 width=0) (actual time=2807.844..2807.844 rows=1 loops=1)
Buffers: shared hit=4629 read=82217
-> Hash Join (cost=2881.95..206825.15 rows=47836 width=0) (actual time=0.775..2807.160 rows=2654 loops=1)
Hash Cond: (pa.address_id = an.address_id)
Buffers: shared hit=4629 read=82217
-> Seq Scan on person_addresses pa (cost=0.00..135924.93 rows=4911993 width=8) (actual time=0.005..1374.610 rows=4911993 loops=1)
Buffers: shared hit=4588 read=82217
-> Hash (cost=2432.05..2432.05 rows=35992 width=18) (actual time=0.756..0.756 rows=1005 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 41kB
Buffers: shared hit=41
-> Index Scan using mp_member2_address_normalization_mod_time on address_normalization an (cost=0.43..2432.05 rows=35992 width=18) (actual time=0.012..0.424 rows=1005 loops=1)
Index Cond: (sr_modification_time >= 1550692189468::bigint)
Buffers: shared hit=41
Planning time: 0.244 ms
Execution time: 2807.885 ms
Update #3: I tried with a newer timestamp and it used an index scan.
EXPLAIN (
ANALYZE
, buffers
, format TEXT
) SELECT
COUNT(*)
FROM
mp_member2.person_addresses pa
JOIN mp_member2.address_normalization an ON
an.address_id = pa.address_id
WHERE
an.sr_modification_time >= 1557507300342;
-- count: 1364
Query Plan:
Aggregate (cost=295.48..295.49 rows=1 width=0) (actual time=2.770..2.770 rows=1 loops=1)
Buffers: shared hit=1404
-> Nested Loop (cost=4.89..295.43 rows=19 width=0) (actual time=0.038..2.491 rows=1364 loops=1)
Buffers: shared hit=1404
-> Index Scan using mp_member2_address_normalization_mod_time on address_normalization an (cost=0.43..8.82 rows=14 width=18) (actual time=0.009..0.142 rows=341 loops=1)
Index Cond: (sr_modification_time >= 1557507300342::bigint)
Buffers: shared hit=14
-> Bitmap Heap Scan on person_addresses pa (cost=4.46..20.43 rows=4 width=8) (actual time=0.004..0.005 rows=4 loops=341)
Recheck Cond: (address_id = an.address_id)
Heap Blocks: exact=360
Buffers: shared hit=1390
-> Bitmap Index Scan on idx_mp_member2_person_addresses_address_id (cost=0.00..4.46 rows=4 width=0) (actual time=0.003..0.003 rows=4 loops=341)
Index Cond: (address_id = an.address_id)
Buffers: shared hit=1030
Planning time: 0.214 ms
Execution time: 2.816 ms
That is the expected behavior because you don't have index for sr_modification_time so after create the hash join db has to scan the whole set to check each row for the sr_modification_time value
You should create:
index for (sr_modification_time)
or composite index for (address_id , sr_modification_time )
I have a psql DB containing various Materialized Views, on running a query, i.e., query_a we complete the query execution in 2800ms and re-running the same query again we get an execution time of 53ms. This can be explained by the caching done by psql. Now comes the tricky part, I create a dump of this db and restore it in NewDB, when I re-run query_a I get an execution time of 2253ms and on re-running get the same time, i.e., it seems that the psql caching is not working on the NewDB.
I conducted various experiments to rectify the same and noticed that there is no improvement when I explicitly refresh the views but if I drop these views and re create it in my NewDB, it gives me the original performance.
Note that the data is constant in DB and NewDB and I have used the commands mentioned here for dump creation and restore.
The result for re running the query on DB is ->
The results for running the same query on NewDB for 1st and 2nd time are as follows ->
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=113790614477.61..113790614477.62 rows=1 width=8) (actual time=2284.605..2284.605 rows=1 loops=1)
Buffers: shared hit=3540872
CTE t
-> Merge Join (cost=40600.92..11846650.56 rows=763041594 width=425) (actual time=3.693..1909.916 rows=6005 loops=1)
Merge Cond: (n.node_id = nd.node_id)
Buffers: shared hit=3524063
-> Index Scan using nodes_node_id on nodes n (cost=0.43..350865.91 rows=3824099 width=389) (actual time=0.014..1651.025 rows=3598491 loops=1)
Buffers: shared hit=3523372
-> Sort (cost=40600.49..40700.26 rows=39907 width=40) (actual time=3.668..4.227 rows=6005 loops=1)
Sort Key: nd.node_id
Sort Method: quicksort Memory: 623kB
Buffers: shared hit=691
-> Bitmap Heap Scan on nodes_depths nd (cost=1153.11..37550.73 rows=39907 width=40) (actual time=0.627..2.846 rows=6005 loops=1)
Recheck Cond: ((ancestor_1 = 1) OR (ancestor_2 = 1))
Heap Blocks: exact=658
Buffers: shared hit=691
-> BitmapOr (cost=1153.11..1153.11 rows=40007 width=0) (actual time=0.547..0.547 rows=0 loops=1)
Buffers: shared hit=33
-> Bitmap Index Scan on nodes_depths_1 (cost=0.00..566.58 rows=20003 width=0) (actual time=0.032..0.032 rows=156 loops=1)
Index Cond: (ancestor_1 = 1)
Buffers: shared hit=4
-> Bitmap Index Scan on nodes_depths_2 (cost=0.00..566.58 rows=20003 width=0) (actual time=0.515..0.515 rows=5849 loops=1)
Index Cond: (ancestor_2 = 1)
Buffers: shared hit=29
-> Merge Right Join (cost=169565733.26..97549168801.28 rows=6491839610305 width=0) (actual time=1915.721..2284.175 rows=6005 loops=1)
Merge Cond: (nodes_fts.node_id = t.node_id)
Buffers: shared hit=3540872
-> Index Only Scan using nodes_fts_idx on nodes_fts (cost=0.43..97055.96 rows=1701569 width=4) (actual time=0.041..277.890 rows=1598712 loops=1)
Heap Fetches: 1598712
Buffers: shared hit=16805
-> Materialize (cost=169565732.84..173380940.81 rows=763041594 width=4) (actual time=1915.675..1916.583 rows=6005 loops=1)
Buffers: shared hit=3524067
-> Sort (cost=169565732.84..171473336.82 rows=763041594 width=4) (actual time=1915.672..1916.057 rows=6005 loops=1)
Sort Key: t.node_id
Sort Method: quicksort Memory: 474kB
Buffers: shared hit=3524067
-> CTE Scan on t (cost=0.00..15260831.88 rows=763041594 width=4) (actual time=3.698..1914.771 rows=6005 loops=1)
Buffers: shared hit=3524063
Planning time: 68.064 ms
Execution time: 2285.084 ms
(40 rows)
and for the second run ->
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=113790614477.61..113790614477.62 rows=1 width=8) (actual time=2295.319..2295.319 rows=1 loops=1)
Buffers: shared hit=3540868
CTE t
-> Merge Join (cost=40600.92..11846650.56 rows=763041594 width=425) (actual time=15.324..1926.744 rows=6005 loops=1)
Merge Cond: (n.node_id = nd.node_id)
Buffers: shared hit=3524063
-> Index Scan using nodes_node_id on nodes n (cost=0.43..350865.91 rows=3824099 width=389) (actual time=0.027..1648.277 rows=3598491 loops=1)
Buffers: shared hit=3523372
-> Sort (cost=40600.49..40700.26 rows=39907 width=40) (actual time=15.254..15.903 rows=6005 loops=1)
Sort Key: nd.node_id
Sort Method: quicksort Memory: 623kB
Buffers: shared hit=691
-> Bitmap Heap Scan on nodes_depths nd (cost=1153.11..37550.73 rows=39907 width=40) (actual time=3.076..10.752 rows=6005 loops=1)
Recheck Cond: ((ancestor_1 = 1) OR (ancestor_2 = 1))
Heap Blocks: exact=658
Buffers: shared hit=691
-> BitmapOr (cost=1153.11..1153.11 rows=40007 width=0) (actual time=2.524..2.525 rows=0 loops=1)
Buffers: shared hit=33
-> Bitmap Index Scan on nodes_depths_1 (cost=0.00..566.58 rows=20003 width=0) (actual time=0.088..0.088 rows=156 loops=1)
Index Cond: (ancestor_1 = 1)
Buffers: shared hit=4
-> Bitmap Index Scan on nodes_depths_2 (cost=0.00..566.58 rows=20003 width=0) (actual time=2.434..2.435 rows=5849 loops=1)
Index Cond: (ancestor_2 = 1)
Buffers: shared hit=29
-> Merge Right Join (cost=169565733.26..97549168801.28 rows=6491839610305 width=0) (actual time=1933.113..2294.894 rows=6005 loops=1)
Merge Cond: (nodes_fts.node_id = t.node_id)
Buffers: shared hit=3540868
-> Index Only Scan using nodes_fts_idx on nodes_fts (cost=0.43..97055.96 rows=1701569 width=4) (actual time=0.077..271.313 rows=1598712 loops=1)
Heap Fetches: 1598712
Buffers: shared hit=16805
-> Materialize (cost=169565732.84..173380940.81 rows=763041594 width=4) (actual time=1933.030..1933.903 rows=6005 loops=1)
Buffers: shared hit=3524063
-> Sort (cost=169565732.84..171473336.82 rows=763041594 width=4) (actual time=1933.026..1933.375 rows=6005 loops=1)
Sort Key: t.node_id
Sort Method: quicksort Memory: 474kB
Buffers: shared hit=3524063
-> CTE Scan on t (cost=0.00..15260831.88 rows=763041594 width=4) (actual time=15.336..1932.145 rows=6005 loops=1)
Buffers: shared hit=3524063
Planning time: 1.154 ms
Execution time: 2295.801 ms
(40 rows)
The estimated number of rows is off from the actual numbers by orders of magnitude:
CTE Scan on t (cost=0.00..15260831.88 rows=763041594 width=4)
(actual time=15.336..1932.145 rows=6005 loops=1)
When Postgres can't make accurate estimates of how much work a particular way of executing your query is compared to another it will generate inefficient query plans and that is why the same query can be slow even if all the data is in RAM.
When you backup a table the dump does not contain the statistics used by the optimizer so you need to wait for the autovacuum daemon or run 'ANALYZE ' manually after restoring from the dump.
I'm using postgres 10, and have the following query
select
count(task.id) over() as _total_ ,
json_agg(u.*) as users,
task.*
from task
left outer join taskuserlink_history tu on (task.id = tu.taskid)
left outer join "user" u on (tu.userId = u.id)
group by task.id offset 10 limit 10;
this query takes approx 800ms to execute
if I remove the count(task.id) over() as _total_ , line, then it executes in 250ms
I have to confess being a complete sql noob, so the query itself may be completely borked
I was wondering if anyone could point to the flaws in the query, and make suggestions on how to speed it up.
The number of tasks is approx 15k, with an average of 5 users per task, linked through taskuserlink
I have looked at the pgadmin "explain" diagram
but to be honest can't really figure it out yet ;)
the table definitions are
task , with id (int) as primary column
taskuserlink_history, with taskId (int) and userId (int) (both as foreign key constraints, indexed)
user, with id (int) as primary column
the query plan is as follows
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.74..12.49 rows=10 width=44) (actual time=1178.016..1178.043 rows=10 loops=1)
Buffers: shared hit=3731, temp read=6655 written=6914
-> WindowAgg (cost=4.74..10248.90 rows=13231 width=44) (actual time=1178.014..1178.040 rows=10 loops=1)
Buffers: shared hit=3731, temp read=6655 written=6914
-> GroupAggregate (cost=4.74..10083.51 rows=13231 width=36) (actual time=0.417..1049.294 rows=13255 loops=1)
Group Key: task.id
Buffers: shared hit=3731
-> Nested Loop Left Join (cost=4.74..9586.77 rows=66271 width=36) (actual time=0.103..309.372 rows=66162 loops=1)
Join Filter: (taskuserlink_history.userid = user_archive.id)
Rows Removed by Join Filter: 1182904
Buffers: shared hit=3731
-> Merge Left Join (cost=0.58..5563.22 rows=66271 width=8) (actual time=0.044..73.598 rows=66162 loops=1)
Merge Cond: (task.id = taskuserlink_history.taskid)
Buffers: shared hit=3629
-> Index Only Scan using task_pkey on task (cost=0.29..1938.30 rows=13231 width=4) (actual time=0.026..7.683 rows=13255 loops=1)
Heap Fetches: 13255
Buffers: shared hit=1810
-> Index Scan using taskuserlink_history_task_fk_idx on taskuserlink_history (cost=0.29..2764.46 rows=66271 width=8) (actual time=0.015..40.109 rows=66162 loops=1)
Filter: (timeend IS NULL)
Rows Removed by Filter: 13368
Buffers: shared hit=1819
-> Materialize (cost=4.17..50.46 rows=4 width=36) (actual time=0.000..0.001 rows=19 loops=66162)
Buffers: shared hit=102
-> Bitmap Heap Scan on user_archive (cost=4.17..50.44 rows=4 width=36) (actual time=0.050..0.305 rows=45 loops=1)
Recheck Cond: (archived_at IS NULL)
Heap Blocks: exact=11
Buffers: shared hit=102
-> Bitmap Index Scan on user_unique_username (cost=0.00..4.16 rows=4 width=0) (actual time=0.014..0.014 rows=46 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Aggregate (cost=8.30..8.31 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=45)
Buffers: shared hit=90
-> Index Scan using task_assignedto_idx on task task_1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=45)
Index Cond: (assignedtoid = user_archive.id)
Buffers: shared hit=90
Planning time: 0.989 ms
Execution time: 1191.451 ms
(37 rows)
without the window function it is
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.74..12.36 rows=10 width=36) (actual time=0.510..1.763 rows=10 loops=1)
Buffers: shared hit=91
-> GroupAggregate (cost=4.74..10083.51 rows=13231 width=36) (actual time=0.509..1.759 rows=10 loops=1)
Group Key: task.id
Buffers: shared hit=91
-> Nested Loop Left Join (cost=4.74..9586.77 rows=66271 width=36) (actual time=0.073..0.744 rows=50 loops=1)
Join Filter: (taskuserlink_history.userid = user_archive.id)
Rows Removed by Join Filter: 361
Buffers: shared hit=91
-> Merge Left Join (cost=0.58..5563.22 rows=66271 width=8) (actual time=0.029..0.161 rows=50 loops=1)
Merge Cond: (task.id = taskuserlink_history.taskid)
Buffers: shared hit=7
-> Index Only Scan using task_pkey on task (cost=0.29..1938.30 rows=13231 width=4) (actual time=0.016..0.031 rows=11 loops=1)
Heap Fetches: 11
Buffers: shared hit=4
-> Index Scan using taskuserlink_history_task_fk_idx on taskuserlink_history (cost=0.29..2764.46 rows=66271 width=8) (actual time=0.009..0.081 rows=50 loops=1)
Filter: (timeend IS NULL)
Rows Removed by Filter: 11
Buffers: shared hit=3
-> Materialize (cost=4.17..50.46 rows=4 width=36) (actual time=0.001..0.009 rows=8 loops=50)
Buffers: shared hit=84
-> Bitmap Heap Scan on user_archive (cost=4.17..50.44 rows=4 width=36) (actual time=0.040..0.382 rows=38 loops=1)
Recheck Cond: (archived_at IS NULL)
Heap Blocks: exact=7
Buffers: shared hit=84
-> Bitmap Index Scan on user_unique_username (cost=0.00..4.16 rows=4 width=0) (actual time=0.012..0.012 rows=46 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Aggregate (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=38)
Buffers: shared hit=76
-> Index Scan using task_assignedto_idx on task task_1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=38)
Index Cond: (assignedtoid = user_archive.id)
Buffers: shared hit=76
Planning time: 0.895 ms
Execution time: 1.890 ms
(35 rows)|
I believe the LIMIT clause is making the difference. LIMIT is limiting the number of rows returned, not neccessarily the work involved:
Your second query can be aborted early after 20 rows have been constructed (10 for OFFSET and 10 for LIMIT).
However, your first query needs to go through the whole set to calculate the count(task.id).
Not what you were asking, but I say it anyway:
"user" is not a table, but a view. That is were both queries actually get slower than they should be (The "Materialize" in the plan).
Using OFFSET for paging calls for trouble because it will get slow when the OFFSET increases
Using OFFSET and LIMIT without an ORDER BY is most likely not what you want. The result sets might not be identical on consecutive calls.