Slow performance and index not taken into consideration in PostgreSQL - postgresql
I'll post my query plan, view and indexes at the bottom of the page so that I can keep this question as clean as possible.
The issue I have is slow performance for a view that is not using indexes as I would expect them to be used. I have a table with around 7 million rows that I use as source for the view below.
I have added an index on eventdate which is being used as expected, but why is the index on manufacturerkey ignored? Which indexes would be more efficient?
Also, is it maybe this part to_char(fe.eventdate, 'HH24:MI'::text) AS hourminutes that hurts the performance?
Query plan: https://explain.dalibo.com/plan/Pvw
CREATE OR REPLACE VIEW public.v_test
AS SELECT df.facilityname,
dd.date,
dt.military_hour AS hour,
to_char(fe.eventdate, 'HH24:MI'::text) AS hourminutes,
df.tenantid,
df.tenantname,
dev.name AS event_type_name,
dtt.name AS ticket_type_name,
dde.name AS device_type_name,
count(*) AS count,
dl.country,
dl.state,
dl.district,
ds.systemmanufacturer
FROM fact_entriesexits fe
JOIN dim_facility df ON df.key = fe.facilitykey
JOIN dim_date dd ON dd.key = fe.datekey
JOIN dim_time dt ON dt.key = fe.timekey
LEFT JOIN dim_device dde ON dde.key = fe.devicekey
JOIN dim_eventtype dev ON dev.key = fe.eventtypekey
JOIN dim_tickettype dtt ON dtt.key = fe.tickettypekey
JOIN dim_licenseplate dl ON dl.key = fe.licenseplatekey
LEFT JOIN dim_systeminterface ds ON ds.key = fe.systeminterfacekey
WHERE fe.manufacturerkey = ANY (ARRAY[2, 1])
AND fe.eventdate >= '2022-01-01'
GROUP BY df.tenantname, df.tenantid, dl.region, dl.country, dl.state,
dl.district, df.facilityname, dev.name, dtt.name, dde.name,
ds.systemmanufacturer, dd.date, dt.military_hour, (to_char(fe.eventdate, 'HH24:MI'::text)), fe.licenseplatekey;
Here are the indexes the table fact_entriesexits contains:
CREATE INDEX idx_devicetype_fact_entriesexits_202008 ON public.fact_entriesexits_202008 USING btree (devicetype)
CREATE INDEX idx_etlsource_fact_entriesexits_202008 ON public.fact_entriesexits_202008 USING btree (etlsource)
CREATE INDEX idx_eventdate_fact_entriesexits_202008 ON public.fact_entriesexits_202008 USING btree (eventdate)
CREATE INDEX idx_fact_entriesexits_202008 ON public.fact_entriesexits_202008 USING btree (datekey)
CREATE INDEX idx_manufacturerkey_202008 ON public.fact_entriesexits_202008 USING btree (manufacturerkey)
Query plan:
Subquery Scan on v_lpr2 (cost=505358.60..508346.26 rows=17079 width=340) (actual time=85619.542..109797.440 rows=3008065 loops=1)
Buffers: shared hit=91037 read=366546, temp read=83669 written=83694
-> Finalize GroupAggregate (cost=505358.60..508175.47 rows=17079 width=359) (actual time=85619.539..109097.943 rows=3008065 loops=1)
Group Key: df.tenantname, df.tenantid, dl.region, dl.country, dl.state, dl.district, df.facilityname, dev.name, dtt.name, dde.name, ds.systemmanufacturer, dd.date, dt.military_hour, (to_char(fe.eventdate, 'HH24:MI'::text)), fe.licenseplatekey
Buffers: shared hit=91037 read=366546, temp read=83669 written=83694
-> Gather Merge (cost=505358.60..507392.70 rows=14232 width=359) (actual time=85619.507..105395.429 rows=3308717 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=91037 read=366546, temp read=83669 written=83694
-> Partial GroupAggregate (cost=504358.57..504749.95 rows=7116 width=359) (actual time=85169.770..94043.715 rows=1102906 loops=3)
Group Key: df.tenantname, df.tenantid, dl.region, dl.country, dl.state, dl.district, df.facilityname, dev.name, dtt.name, dde.name, ds.systemmanufacturer, dd.date, dt.military_hour, (to_char(fe.eventdate, 'HH24:MI'::text)), fe.licenseplatekey
Buffers: shared hit=91037 read=366546, temp read=83669 written=83694
-> Sort (cost=504358.57..504376.36 rows=7116 width=351) (actual time=85169.748..91995.088 rows=1500405 loops=3)
Sort Key: df.tenantname, df.tenantid, dl.region, dl.country, dl.state, dl.district, df.facilityname, dev.name, dtt.name, dde.name, ds.systemmanufacturer, dd.date, dt.military_hour, (to_char(fe.eventdate, 'HH24:MI'::text)), fe.licenseplatekey
Sort Method: external merge Disk: 218752kB
Buffers: shared hit=91037 read=366546, temp read=83669 written=83694
-> Hash Left Join (cost=3904.49..503903.26 rows=7116 width=351) (actual time=52.894..46338.295 rows=1500405 loops=3)
Hash Cond: (fe.systeminterfacekey = ds.key)
Buffers: shared hit=90979 read=366546
-> Hash Join (cost=3886.89..503848.87 rows=7116 width=321) (actual time=52.458..44551.012 rows=1500405 loops=3)
Hash Cond: (fe.licenseplatekey = dl.key)
Buffers: shared hit=90943 read=366546
-> Hash Left Join (cost=3849.10..503792.31 rows=7116 width=269) (actual time=51.406..43869.673 rows=1503080 loops=3)
Hash Cond: (fe.devicekey = dde.key)
Buffers: shared hit=90870 read=366546
-> Hash Join (cost=3405.99..503330.51 rows=7116 width=255) (actual time=47.077..43258.069 rows=1503080 loops=3)
Hash Cond: (fe.timekey = dt.key)
Buffers: shared hit=90021 read=366546
-> Hash Join (cost=570.97..500476.80 rows=7116 width=257) (actual time=6.869..42345.723 rows=1503080 loops=3)
Hash Cond: (fe.datekey = dd.key)
Buffers: shared hit=87348 read=366546
-> Hash Join (cost=166.75..500053.90 rows=7116 width=257) (actual time=2.203..41799.463 rows=1503080 loops=3)
Hash Cond: (fe.facilitykey = df.key)
Buffers: shared hit=86787 read=366546
-> Hash Join (cost=2.72..499871.14 rows=7116 width=224) (actual time=0.362..41103.372 rows=1503085 loops=3)
Hash Cond: (fe.tickettypekey = dtt.key)
Buffers: shared hit=86427 read=366546
-> Hash Join (cost=1.14..499722.81 rows=54741 width=214) (actual time=0.311..40595.537 rows=1503085 loops=3)
Hash Cond: (fe.eventtypekey = dev.key)
Buffers: shared hit=86424 read=366546
-> Append (cost=0.00..494830.25 rows=1824733 width=40) (actual time=0.266..40015.860 rows=1503085 loops=3)
Buffers: shared hit=86421 read=366546
-> Parallel Seq Scan on fact_entriesexits fe (cost=0.00..0.00 rows=1 width=40) (actual time=0.001..0.001 rows=0 loops=3)
Filter: ((manufacturerkey = ANY ('{2,1}'::integer[])) AND (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone))
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202101 on fact_entriesexits_202101 fe_25 (cost=0.42..4.28 rows=1 width=40) (actual time=0.005..0.006 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202102 on fact_entriesexits_202102 fe_26 (cost=0.42..4.27 rows=1 width=40) (actual time=0.005..0.006 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202103 on fact_entriesexits_202103 fe_27 (cost=0.42..4.24 rows=1 width=40) (actual time=0.007..0.007 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202104 on fact_entriesexits_202104 fe_28 (cost=0.42..4.05 rows=1 width=40) (actual time=0.006..0.006 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202105 on fact_entriesexits_202105 fe_29 (cost=0.43..4.12 rows=1 width=40) (actual time=0.006..0.006 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202106 on fact_entriesexits_202106 fe_30 (cost=0.43..4.19 rows=1 width=40) (actual time=0.005..0.006 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202107 on fact_entriesexits_202107 fe_31 (cost=0.43..4.28 rows=1 width=40) (actual time=0.005..0.006 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202108 on fact_entriesexits_202108 fe_32 (cost=0.43..3.83 rows=1 width=40) (actual time=0.007..0.007 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202109 on fact_entriesexits_202109 fe_33 (cost=0.43..3.40 rows=1 width=40) (actual time=0.006..0.007 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202110 on fact_entriesexits_202110 fe_34 (cost=0.43..2.77 rows=1 width=40) (actual time=0.005..0.005 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202111 on fact_entriesexits_202111 fe_35 (cost=0.43..3.21 rows=1 width=40) (actual time=0.005..0.005 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Index Scan using idx_eventdate_fact_entriesexits_202112 on fact_entriesexits_202112 fe_36 (cost=0.43..3.45 rows=1 width=40) (actual time=0.004..0.004 rows=0 loops=3)
Index Cond: (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone)
Filter: (manufacturerkey = ANY ('{2,1}'::integer[]))
Buffers: shared hit=3
-> Parallel Seq Scan on fact_entriesexits_202201 fe_37 (cost=0.00..382550.76 rows=445931 width=40) (actual time=0.032..39090.092 rows=379902 loops=3)
Filter: ((manufacturerkey = ANY ('{2,1}'::integer[])) AND (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 298432
Buffers: shared hit=3286 read=366546
-> Parallel Seq Scan on fact_entriesexits_202204 fe_38 (cost=0.00..39567.99 rows=469653 width=40) (actual time=0.015..242.895 rows=375639 loops=3)
Filter: ((manufacturerkey = ANY ('{2,1}'::integer[])) AND (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 158868
Buffers: shared hit=29546
-> Parallel Seq Scan on fact_entriesexits_202202 fe_39 (cost=0.00..30846.99 rows=437343 width=40) (actual time=0.019..230.952 rows=357451 loops=3)
Filter: ((manufacturerkey = ANY ('{2,1}'::integer[])) AND (eventdate >= '2022-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 98708
Buffers: shared hit=22294
I think you'll get the most benefit out of creating a composite index for querying with both eventdate and manufacturerkey; e.g.:
CREATE INDEX idx_manufacturerkey_eventdate_202008
ON public.fact_entriesexits_202008 USING btree (manufacturerkey, eventdate)
Since it's a composite index, put whatever column you're more likely to query by alone on the left side. You can remove the other index for that column, since it will be covered by the composite index.
As for the to_char on evendate, while you could make a special index for that calculation, you might be able to get better performance by splitting the query up into a grouped CTE and a join. In other words, limit the group by to the columns that actually define your unique groups, and then join that query with the tables that you need to get the final selection of columns.
Your slowest seq scan step is returning over half the rows of its partition, removing 298432 and returning 379902. (times around 3 each due to parallel workers). An index is unlikely to be helpful when returning so much of the table rows anyway.
Note that that partition also seems to be massively bloated. It is hard to see why else it would be so slow, and require so many buffer reads compared to the number of rows.
Related
Append Cost very high on Partitioned table
I have a query joining two tables partitioned on timestamp column. Both tables are filtered on current date partition. But query is unusually slow with APPEND Cost of the driving table very high. Query and Plan : https://explain.dalibo.com/plan/wVA Nested Loop (cost=0.56..174042.82 rows=16 width=494) (actual time=0.482..20.133 rows=1713 loops=1) Output: tran.transaction_status, mgwor.apx_transaction_id, org.organisation_name, mgwor.order_status, mgwor.request_date, mgwor.response_date, (date_part('epoch'::text, mgwor.response_date) - date_part('epoch'::text, mgwor.request_date)) Buffers: shared hit=5787 dirtied=3 -> Nested Loop (cost=0.42..166837.32 rows=16 width=337) (actual time=0.459..7.803 rows=1713 loops=1) Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, org.organisation_name Join Filter: ((account.account_id)::text = (mgwor.account_id)::text) Rows Removed by Join Filter: 3007 Buffers: shared hit=589 -> Nested Loop (cost=0.27..40.66 rows=4 width=54) (actual time=0.203..0.483 rows=2 loops=1) Output: account.account_id, org.organisation_name Join Filter: ((account.organisation_id)::text = (org.organisation_id)::text) Rows Removed by Join Filter: 289 Buffers: shared hit=27 -> Index Scan using account_pkey on mdm.account (cost=0.27..32.55 rows=285 width=65) (actual time=0.013..0.122 rows=291 loops=1) Output: account.account_id, account.account_created_at, account.account_name, account.account_status, account.account_valid_until, account.currency_id, account.organisation_id, account.organisation_psp_id, account."account_threeDS_required", account.account_use_webhook, account.account_webhook_url, account.account_webhook_max_attempt, account.reporting_account_id, account.card_type, account.country_id, account.product_id Buffers: shared hit=24 -> Materialize (cost=0.00..3.84 rows=1 width=55) (actual time=0.000..0.000 rows=1 loops=291) Output: org.organisation_name, org.organisation_id Buffers: shared hit=3 -> Seq Scan on mdm.organisation_smd org (cost=0.00..3.84 rows=1 width=55) (actual time=0.017..0.023 rows=1 loops=1) Output: org.organisation_name, org.organisation_id Filter: ((org.organisation_name)::text = 'ABC'::text) Rows Removed by Filter: 67 Buffers: shared hit=3 -> Materialize (cost=0.15..166576.15 rows=3835 width=473) (actual time=0.127..2.826 rows=2360 loops=2) Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id Buffers: shared hit=562 -> Append (cost=0.15..166556.97 rows=3835 width=473) (actual time=0.252..3.661 rows=2360 loops=1) Buffers: shared hit=562 Subplans Removed: 1460 -> Bitmap Heap Scan on public.mgworderrequest_part_20200612 mgwor (cost=50.98..672.23 rows=2375 width=91) (actual time=0.251..2.726 rows=2360 loops=1) Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id Recheck Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[])) Filter: ((mgwor.request_date >= date(now())) AND (mgwor.request_date < (date(now()) + 1))) Heap Blocks: exact=549 Buffers: shared hit=562 -> Bitmap Index Scan on mgworderrequest_part_20200612_request_type_idx (cost=0.00..50.38 rows=2375 width=0) (actual time=0.191..0.192 rows=2361 loops=1) Index Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[])) Buffers: shared hit=13 -> Append (cost=0.14..435.73 rows=1461 width=316) (actual time=0.005..0.006 rows=1 loops=1713) Buffers: shared hit=5198 dirtied=3 Subplans Removed: 1460 -> Index Scan using transaction_part_20200612_pkey on public.transaction_part_20200612 tran (cost=0.29..0.87 rows=1 width=42) (actual time=0.004..0.005 rows=1 loops=1713) Output: tran.transaction_status, tran.transaction_id Index Cond: (((tran.transaction_id)::text = (mgwor.apx_transaction_id)::text) AND (tran.transaction_created_at >= date(now())) AND (tran.transaction_created_at < (date(now()) + 1))) Filter: (tran.transaction_status IS NOT NULL) Buffers: shared hit=5198 dirtied=3 Planning Time: 19535.308 ms Execution Time: 21.006 ms Partition pruning is working on both the tables. Am I missing something obvious here? Thanks, VA
I don't know why the cost estimate for the append is so large, but presumably you are really worried about how long this takes, not how large the estimate is. As noted, the actual time is going to planning, not to execution. A likely explanation is that it was waiting on a lock. Time spent waiting on a table lock for a partition table (but not for the parent table) gets attributed to planning time.
Postgres Optimizer: Why it lies about costs? [EDIT] How to pick random_page_cost?
I've got following issue with Postgres: Got two tables A and B: A got 64 mln records B got 16 mln records A got b_id field which is indexed --> ix_A_b_id B got datetime_field which is indexed --> ix_B_datetime Got following query: SELECT A.id, B.some_field FROM A JOIN B ON A.b_id = B.id WHERE B.datetime_field BETWEEN 'from' AND 'to' This query is fine when difference between from and to is small, in that case postgres use both indexes and i get results quite fast When difference between dates is bigger query is slowing much, because postgres decides to use ix_B_datetime only and then Full Scan on table with 64 M records... which is simple stupid I found point when optimizer decides that using Full Scan is faster. For dates between 2019-03-10 17:05:00 and 2019-03-15 01:00:00 it got similar cost like for 2019-03-10 17:00:00 and 2019-03-15 01:00:00. But fetching time for first query is something about 50 ms and for second almost 2 minutes. Plans are below Nested Loop (cost=1.00..3484455.17 rows=113057 width=8) -> Index Scan using ix_B_datetime on B (cost=0.44..80197.62 rows=28561 width=12) Index Cond: ((datetime_field >= '2019-03-10 17:05:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone)) -> Index Scan using ix_A_b_id on A (cost=0.56..112.18 rows=701 width=12) Index Cond: (b_id = B.id) Hash Join (cost=80615.72..3450771.89 rows=113148 width=8) Hash Cond: (A.b_id = B.id) -> Seq Scan on spot (cost=0.00..3119079.50 rows=66652050 width=12) -> Hash (cost=80258.42..80258.42 rows=28584 width=12) -> Index Scan using ix_B_datetime on B (cost=0.44..80258.42 rows=28584 width=12) Index Cond: ((datetime_field >= '2019-03-10 17:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone)) So my question is why my Postgres lies about costs? Why it calculates something more expensive as it is actually? How to fix that? Temporary I had to rewrite query to always use index on table A but I do not like following solution, because it's hacky, not clear and slower for small chunks of data but much faster for bigger chunks with cc as ( select id, some_field from B WHERE B.datetime_field >= '2019-03-08' AND B.datetime_field < '2019-03-15' ) SELECT X.id, Y.some_field FROM (SELECT b_id, id from A where b_id in (SELECT id from cc)) X JOIN (SELECT id, some_field FROM cc) Y ON X.b_id = Y.id EDIT: So as #a_horse_with_no_name suggested I've played with RANDOM_PAGE_COST I've modified query to count number of entries because fetching all was unnecessary so query looks following SELECT count(*) FROM ( SELECT A.id, B.some_field FROM A JOIN B ON A.b_id = B.id WHERE B.datetime_field BETWEEN '2019-03-01 00:00:00' AND '2019-03-15 01:00:00' ) A And I've tested different levels of cost RANDOM_PAGE_COST=0.25 Aggregate (cost=3491773.34..3491773.35 rows=1 width=8) (actual time=4166.998..4166.999 rows=1 loops=1) Buffers: shared hit=1939402 -> Nested Loop (cost=1.00..3490398.51 rows=549932 width=0) (actual time=0.041..3620.975 rows=2462836 loops=1) Buffers: shared hit=1939402 -> Index Scan using ix_B_datetime_field on B (cost=0.44..24902.79 rows=138927 width=8) (actual time=0.013..364.018 rows=313399 loops=1) Index Cond: ((datetime_field >= '2019-03-01 00:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone)) Buffers: shared hit=311461 -> Index Only Scan using A_b_id_index on A (cost=0.56..17.93 rows=701 width=8) (actual time=0.004..0.007 rows=8 loops=313399) Index Cond: (b_id = B.id) Heap Fetches: 2462836 Buffers: shared hit=1627941 Planning time: 0.316 ms Execution time: 4167.040 ms RANDOM_PAGE_COST=1 Aggregate (cost=3918191.39..3918191.40 rows=1 width=8) (actual time=281236.100..281236.101 rows=1 loops=1) " Buffers: shared hit=7531789 read=2567818, temp read=693 written=693" -> Merge Join (cost=102182.07..3916816.56 rows=549932 width=0) (actual time=243755.551..280666.992 rows=2462836 loops=1) Merge Cond: (A.b_id = B.id) " Buffers: shared hit=7531789 read=2567818, temp read=693 written=693" -> Index Only Scan using A_b_id_index on A (cost=0.56..3685479.55 rows=66652050 width=8) (actual time=0.010..263635.124 rows=64700055 loops=1) Heap Fetches: 64700055 Buffers: shared hit=7220328 read=2567818 -> Materialize (cost=101543.05..102237.68 rows=138927 width=8) (actual time=523.618..1287.145 rows=2503965 loops=1) " Buffers: shared hit=311461, temp read=693 written=693" -> Sort (cost=101543.05..101890.36 rows=138927 width=8) (actual time=523.616..674.736 rows=313399 loops=1) Sort Key: B.id Sort Method: external merge Disk: 5504kB " Buffers: shared hit=311461, temp read=693 written=693" -> Index Scan using ix_B_datetime_field on B (cost=0.44..88589.92 rows=138927 width=8) (actual time=0.013..322.016 rows=313399 loops=1) Index Cond: ((datetime_field >= '2019-03-01 00:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone)) Buffers: shared hit=311461 Planning time: 0.314 ms Execution time: 281237.202 ms RANDOM_PAGE_COST=2 Aggregate (cost=4072947.53..4072947.54 rows=1 width=8) (actual time=166896.775..166896.776 rows=1 loops=1) " Buffers: shared hit=696849 read=2067171, temp read=194524 written=194516" -> Hash Join (cost=175785.69..4071572.70 rows=549932 width=0) (actual time=29321.835..166332.812 rows=2462836 loops=1) Hash Cond: (A.B_id = B.id) " Buffers: shared hit=696849 read=2067171, temp read=194524 written=194516" -> Seq Scan on A (cost=0.00..3119079.50 rows=66652050 width=8) (actual time=0.008..108959.789 rows=64700055 loops=1) Buffers: shared hit=437580 read=2014979 -> Hash (cost=173506.11..173506.11 rows=138927 width=8) (actual time=29321.416..29321.416 rows=313399 loops=1) Buckets: 131072 (originally 131072) Batches: 8 (originally 2) Memory Usage: 4084kB " Buffers: shared hit=259269 read=52192, temp written=803" -> Index Scan using ix_B_datetime_field on B (cost=0.44..173506.11 rows=138927 width=8) (actual time=1.676..29158.413 rows=313399 loops=1) Index Cond: ((datetime_field >= '2019-03-01 00:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone)) Buffers: shared hit=259269 read=52192 Planning time: 7.367 ms Execution time: 166896.824 ms Still it's unclear for me, cost 0.25 is best for me but everywhere I can read that for ssd disk it should be 1-1.5. (I'm using AWS instance with ssd) What is weird at cost 1 plan is worse than at 2 and 0.25 So what value to pick? Is there any possibility to calculate it? Costs 0.25 > 2 > 1 efficiency in that case, what about other cases? How can I be sure that 0.25 which is good for my query won't break other queries. Do I need to write performance tests for every query I got?
Postgres' planning takes unproportional time for execution
postgres 9.6 running on amazon RDS. I have 2 tables: aggregate events - big table with 6 keys (ids) campaign metadata - small table with campaign definition. I join the 2 in order to filter on metadata like campaign-name. The query is in order to get a report of displayed breakdown by campaign channel and date ( date is daily ). No FK and not null. The report table has multiple lines per day per campaigns ( because the aggregation is based on 6 attribute key ). When i join , query plan grow to 10s ( vs 300ms) explain analyze select c.campaign_channel as channel,date as day , sum( displayed ) as displayed from report_campaigns c left join events_daily r on r.campaign_id = c.c_id where provider_id = 7726 and c.p_id = 7726 and c.campaign_name <> 'test' and date >= '20170513 12:00' and date <= '20170515 12:00' group by c.campaign_channel,date; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=71461.93..71466.51 rows=229 width=22) (actual time=104.189..114.788 rows=6 loops=1) Group Key: c.campaign_channel, r.date -> Sort (cost=71461.93..71462.51 rows=229 width=18) (actual time=100.263..106.402 rows=31205 loops=1) Sort Key: c.campaign_channel, r.date Sort Method: quicksort Memory: 3206kB -> Hash Join (cost=1092.52..71452.96 rows=229 width=18) (actual time=22.149..86.955 rows=31205 loops=1) Hash Cond: (r.campaign_id = c.c_id) -> Append (cost=0.00..70245.84 rows=29948 width=20) (actual time=21.318..71.315 rows=31205 loops=1) -> Seq Scan on events_daily r (cost=0.00..0.00 rows=1 width=20) (actual time=0.005..0.005 rows=0 loops=1) Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone) AND (provider_id = -> Bitmap Heap Scan on events_daily_20170513 r_1 (cost=685.36..23913.63 rows=1 width=20) (actual time=17.230..17.230 rows=0 loops=1) Recheck Cond: (provider_id = 7726) Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone)) Rows Removed by Filter: 13769 Heap Blocks: exact=10276 -> Bitmap Index Scan on events_daily_20170513_full_idx (cost=0.00..685.36 rows=14525 width=0) (actual time=2.356..2.356 rows=13769 loops=1) Index Cond: (provider_id = 7726) -> Bitmap Heap Scan on events_daily_20170514 r_2 (cost=689.08..22203.52 rows=14537 width=20) (actual time=4.082..21.389 rows=15281 loops=1) Recheck Cond: (provider_id = 7726) Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone)) Heap Blocks: exact=10490 -> Bitmap Index Scan on events_daily_20170514_full_idx (cost=0.00..685.45 rows=14537 width=0) (actual time=2.428..2.428 rows=15281 loops=1) Index Cond: (provider_id = 7726) -> Bitmap Heap Scan on events_daily_20170515 r_3 (cost=731.84..24128.69 rows=15409 width=20) (actual time=4.297..22.662 rows=15924 loops=1) Recheck Cond: (provider_id = 7726) Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone)) Heap Blocks: exact=11318 -> Bitmap Index Scan on events_daily_20170515_full_idx (cost=0.00..727.99 rows=15409 width=0) (actual time=2.506..2.506 rows=15924 loops=1) Index Cond: (provider_id = 7726) -> Hash (cost=1085.35..1085.35 rows=574 width=14) (actual time=0.815..0.815 rows=582 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 37kB -> Bitmap Heap Scan on report_campaigns c (cost=12.76..1085.35 rows=574 width=14) (actual time=0.090..0.627 rows=582 loops=1) Recheck Cond: (p_id = 7726) Filter: ((campaign_name)::text <> 'test'::text) Heap Blocks: exact=240 -> Bitmap Index Scan on report_campaigns_provider_id (cost=0.00..12.62 rows=577 width=0) (actual time=0.062..0.062 rows=582 loops=1) Index Cond: (p_id = 7726) Planning time: 9651.605 ms Execution time: 115.092 ms result: channel | day | displayed ----------+---------------------+----------- Pin | 2017-05-14 00:00:00 | 43434 Pin | 2017-05-15 00:00:00 | 3325325235
I seems to me this is because of summation forcing pre-computation before left joining. Solution could be to impose filtering WHERE clauses in two nested sub-SELECT prior to left-joining and summation. Hope this works: SELECT channel, day, sum( displayed ) FROM (SELECT campaign_channel AS channel, date AS day, displayed, p_id AS c_id FROM report_campaigns WHERE p_id = 7726 AND campaign_name <> 'test' AND date >= '20170513 12:00' AND date <= '20170515 12:00') AS c, (SELECT * FROM events_daily WHERE campaign_id = 7726) AS r LEFT JOIN r.campaign_id = c.c_id GROUP BY channel, day;
PostgreSQL performance difference with datetime comparison
I'm trying to optimize the performance of my PostgreSQL queries. I noticed a big change in the time required to execute my query when I change the datetime in my query by one second. I'm trying to figure out why there's this drastic change in performance with such a small change in the query. I ran an explain(analyze, buffers) and see there's a difference in how they are operating but I don't understand enough to determine what to do about it. Any help? Here is the first query SELECT avg(travel_time_all) FROM tt_data WHERE date_time >= '2014-01-01 08:00:00' and date_time < '2014-01-01 8:14:13' and (tmc = '118P04252' or tmc = '118P04253' or tmc = '118P04254' or tmc = '118P04255' or tmc = '118P04256') group by tmc order by tmc If I increase the later date_time by one second to 2014-01-01 8:14:14 and rerun the query, it drastically increases the execution time. Here are the results of the explain (analyze, buffers) on the two queries. First query: GroupAggregate (cost=6251.99..6252.01 rows=1 width=14) (actual time=0.829..0.829 rows=1 loops=1) Buffers: shared hit=506 -> Sort (cost=6251.99..6252.00 rows=1 width=14) (actual time=0.823..0.823 rows=1 loops=1) Sort Key: tmc Sort Method: quicksort Memory: 25kB Buffers: shared hit=506 -> Bitmap Heap Scan on tt_data (cost=36.29..6251.98 rows=1 width=14) (actual time=0.309..0.817 rows=1 loops=1) Recheck Cond: ((date_time >= '2014-01-01 08:00:00'::timestamp without time zone) AND (date_time < '2014-01-01 08:14:13'::timestamp without time zone)) Filter: ((tmc = '118P04252'::text) OR (tmc = '118P04253'::text) OR (tmc = '118P04254'::text) OR (tmc = '118P04255'::text) OR (tmc = '118P04256'::text)) Rows Removed by Filter: 989 Buffers: shared hit=506 -> Bitmap Index Scan on tt_data_2_date_time_idx (cost=0.00..36.29 rows=1572 width=0) (actual time=0.119..0.119 rows=990 loops=1) Index Cond: ((date_time >= '2014-01-01 08:00:00'::timestamp without time zone) AND (date_time < '2014-01-01 08:14:13'::timestamp without time zone)) Buffers: shared hit=7 Total runtime: 0.871 ms Below is the second query: GroupAggregate (cost=6257.31..6257.34 rows=1 width=14) (actual time=52.444..52.444 rows=1 loops=1) Buffers: shared hit=2693 -> Sort (cost=6257.31..6257.32 rows=1 width=14) (actual time=52.438..52.438 rows=1 loops=1) Sort Key: tmc Sort Method: quicksort Memory: 25kB Buffers: shared hit=2693 -> Bitmap Heap Scan on tt_data (cost=6253.28..6257.30 rows=1 width=14) (actual time=52.427..52.431 rows=1 loops=1) Recheck Cond: ((date_time >= '2014-01-01 08:00:00'::timestamp without time zone) AND (date_time < '2014-01-01 08:14:14'::timestamp without time zone) AND ((tmc = '118P04252'::text) OR (tmc = '118P04253'::text) OR (tmc = '118P04254'::text) OR (...) Rows Removed by Index Recheck: 5 Buffers: shared hit=2693 -> BitmapAnd (cost=6253.28..6253.28 rows=1 width=0) (actual time=52.410..52.410 rows=0 loops=1) Buffers: shared hit=2689 -> Bitmap Index Scan on tt_data_2_date_time_idx (cost=0.00..36.31 rows=1574 width=0) (actual time=0.132..0.132 rows=990 loops=1) Index Cond: ((date_time >= '2014-01-01 08:00:00'::timestamp without time zone) AND (date_time < '2014-01-01 08:14:14'::timestamp without time zone)) Buffers: shared hit=7 -> BitmapOr (cost=6216.71..6216.71 rows=271178 width=0) (actual time=52.156..52.156 rows=0 loops=1) Buffers: shared hit=2682 -> Bitmap Index Scan on tt_data_2_tmc_idx (cost=0.00..1243.34 rows=54236 width=0) (actual time=8.439..8.439 rows=125081 loops=1) Index Cond: (tmc = '118P04252'::text) Buffers: shared hit=483 -> Bitmap Index Scan on tt_data_2_tmc_idx (cost=0.00..1243.34 rows=54236 width=0) (actual time=10.257..10.257 rows=156115 loops=1) Index Cond: (tmc = '118P04253'::text) Buffers: shared hit=602 -> Bitmap Index Scan on tt_data_2_tmc_idx (cost=0.00..1243.34 rows=54236 width=0) (actual time=6.867..6.867 rows=102318 loops=1) Index Cond: (tmc = '118P04254'::text) Buffers: shared hit=396 -> Bitmap Index Scan on tt_data_2_tmc_idx (cost=0.00..1243.34 rows=54236 width=0) (actual time=13.371..13.371 rows=160566 loops=1) Index Cond: (tmc = '118P04255'::text) Buffers: shared hit=619 -> Bitmap Index Scan on tt_data_2_tmc_idx (cost=0.00..1243.34 rows=54236 width=0) (actual time=13.218..13.218 rows=150709 loops=1) Index Cond: (tmc = '118P04256'::text) Buffers: shared hit=582 Total runtime: 52.507 ms Any advice on how to make the second query as fast as the first? I'd like to increase this time interval by a greater amount but don't want the performance to decrease.
Postgresql doesn't use partial index
Postgresql 9.3 I have a table with date_field: date_field timestamp without time zone CREATE INDEX ix__table__date_field ON table USING btree (date_field) WHERE date_field IS NOT NULL; Then I've tried to use my partial index: EXPLAIN ANALYZE SELECT count(*) from table where date_field is not null; Aggregate (cost=29048.22..29048.23 rows=1 width=0) (actual time=41.714..41.714 rows=1 loops=1) -> Seq Scan on table (cost=0.00..28138.83 rows=363755 width=0) (actual time=41.711..41.711 rows=0 loops=1) Filter: (date_field IS NOT NULL) Rows Removed by Filter: 365583 Total runtime: 41.744 ms But it uses partial index with comparing dates: EXPLAIN ANALYZE SELECT count(*) from table where date_field > '2015-1-1'; Aggregate (cost=26345.51..26345.52 rows=1 width=0) (actual time=0.006..0.007 rows=1 loops=1) -> Bitmap Heap Scan on table (cost=34.60..26040.86 rows=121861 width=0) (actual time=0.005..0.005 rows=0 loops=1) Recheck Cond: (date_field > '2015-01-01 00:00:00'::timestamp without time zone) -> Bitmap Index Scan on ix__table__date_field (cost=0.00..4.13 rows=121861 width=0) (actual time=0.003..0.003 rows=0 loops=1) Index Cond: (date_field > '2015-01-01 00:00:00'::timestamp without time zone) Total runtime: 0.037 ms So, why it doesn't use index on date_field is not null? Thanks in advance!