PostgreSQL 9.4 Index speed very slow on Select - postgresql

The box running PostgreSQL 9.4 is a 32 core system at 3.5ghz per core with 128GB or ram with mirrored Samsung pro 850 SSD drives on FreeBSD with ZFS. This is no reason for this poor of performance from PostgreSQL!
psql (9.4.5)
forex=# \d pair_data;
Table "public.pair_data"
Column | Type | Modifiers
------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('pair_data_id_seq'::regclass)
pair_name | character varying(6) |
pair_price | numeric(9,6) |
ts | timestamp without time zone |
Indexes:
"pair_data_pkey" PRIMARY KEY, btree (id)
"date_idx" gin (ts)
"pair_data_gin_idx1" gin (pair_name)
"pair_data_gin_idx2" gin (ts)
With this select:
forex=# explain (analyze, buffers) select pair_name as name, pair_price as price, ts as date from pair_data where pair_name = 'EURUSD' order by ts desc limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=201442.55..201442.57 rows=10 width=22) (actual time=10870.395..10870.430 rows=10 loops=1)
Buffers: shared hit=40 read=139150 dirtied=119
-> Sort (cost=201442.55..203787.44 rows=937957 width=22) (actual time=10870.390..10870.401 rows=10 loops=1)
Sort Key: ts
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=40 read=139150 dirtied=119
-> Bitmap Heap Scan on pair_data (cost=9069.17..181173.63 rows=937957 width=22) (actual time=614.976..8903.913 rows=951858 loops=1)
Recheck Cond: ((pair_name)::text = 'EURUSD'::text)
Rows Removed by Index Recheck: 13624055
Heap Blocks: exact=33464 lossy=105456
Buffers: shared hit=40 read=139150 dirtied=119
-> Bitmap Index Scan on pair_data_gin_idx1 (cost=0.00..8834.68 rows=937957 width=0) (actual time=593.701..593.701 rows=951858 loops=1)
Index Cond: ((pair_name)::text = 'EURUSD'::text)
Buffers: shared hit=40 read=230
Planning time: 0.387 ms
Execution time: 10871.419 ms
(16 rows)
Or this select:
forex=# explain (analyze, buffers) with intervals as ( select start, start + interval '4hr' as end from generate_series('2015-12-01 15:00', '2016-01-19 16:00', interval '4hr') as start ) select distinct intervals.start as date, min(pair_price) over w as low, max(pair_price) over w as high, first_value(pair_price) over w as open, last_value(pair_price) over w as close from intervals join pair_data mb on mb.pair_name = 'EURUSD' and mb.ts >= intervals.start and mb.ts < intervals.end window w as (partition by intervals.start order by mb.ts asc rows between unbounded preceding and unbounded following) order by intervals.start;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=64634864.43..66198331.09 rows=815054 width=23) (actual time=1379732.924..1384602.952 rows=82 loops=1)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3332596 written=17050
CTE intervals
-> Function Scan on generate_series start (cost=0.00..12.50 rows=1000 width=8) (actual time=0.135..1.801 rows=295 loops=1)
-> Sort (cost=64634851.92..64895429.70 rows=104231111 width=23) (actual time=1379732.918..1381724.179 rows=951970 loops=1)
Sort Key: intervals.start, (min(mb.pair_price) OVER (?)), (max(mb.pair_price) OVER (?)), (first_value(mb.pair_price) OVER (?)), (last_value(mb.pair_price) OVER (?))
Sort Method: external sort Disk: 60808kB
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3332596 written=17050
-> WindowAgg (cost=41474743.35..44341098.90 rows=104231111 width=23) (actual time=1341744.405..1365946.672 rows=951970 loops=1)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3324995 written=9449
-> Sort (cost=41474743.35..41735321.12 rows=104231111 width=23) (actual time=1341743.502..1343723.884 rows=951970 loops=1)
Sort Key: intervals.start, mb.ts
Sort Method: external sort Disk: 32496kB
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1154778 written=7975
-> Nested Loop (cost=9098.12..21180990.32 rows=104231111 width=23) (actual time=271672.696..1337526.628 rows=951970 loops=1)
Join Filter: ((mb.ts >= intervals.start) AND (mb.ts < intervals."end"))
Rows Removed by Join Filter: 279879180
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1150716 written=3913
-> CTE Scan on intervals (cost=0.00..20.00 rows=1000 width=16) (actual time=0.142..4.075 rows=295 loops=1)
-> Materialize (cost=9098.12..190496.52 rows=938080 width=15) (actual time=2.125..1962.153 rows=951970 loops=295)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1150716 written=3913
-> Bitmap Heap Scan on pair_data mb (cost=9098.12..181225.12 rows=938080 width=15) (actual time=622.818..11474.987 rows=951970 loops=1)
Recheck Cond: ((pair_name)::text = 'EURUSD'::text)
Rows Removed by Index Recheck: 13623989
Heap Blocks: exact=33485 lossy=105456
Buffers: shared hit=8 read=139210 dirtied=22
-> Bitmap Index Scan on pair_data_gin_idx1 (cost=0.00..8863.60 rows=938080 width=0) (actual time=601.158..601.158 rows=951970 loops=1)
Index Cond: ((pair_name)::text = 'EURUSD'::text)
Buffers: shared hit=8 read=269
Planning time: 0.454 ms
Execution time: 1384653.385 ms
(31 rows)
Table of pair_data only has:
forex=# select count(*) from pair_data;
count
----------
21833886
(1 row)
Why is this doing heap scans when their are indexes? I do not understand what is going on with the query plan? Dose anyone have an idea on where the problem might be?

Related

Join on UUID column not using index for multiple values

I have a PostgreSQL database that I cloned.
Database 1 has varchar(36) as primary keys
Database 2 (the clone) has UUID as primary keys.
Both contain the same data. What I don't understand is why queries on Database 1 will use the index but Database 2 will not. Here's the query:
EXPLAIN (ANALYZE, BUFFERS)
select * from table1
INNER JOIN table2 on table1.id = table2.table1_id
where table1.id in (
'541edffc-7179-42db-8c99-727be8c9ffec',
'eaac06d3-e44e-4e4a-8e11-1cdc6e562996'
);
Database 1
Nested Loop (cost=16.13..7234.96 rows=14 width=803) (actual time=0.072..0.112 rows=8 loops=1)
Buffers: shared hit=23
-> Index Scan using table1_pk on table1 (cost=0.56..17.15 rows=2 width=540) (actual time=0.042..0.054 rows=2 loops=1)
" Index Cond: ((id)::text = ANY ('{541edffc-7179-42db-8c99-727be8c9ffec,eaac06d3-e44e-4e4a-8e11-1cdc6e562996}'::text[]))"
Buffers: shared hit=12
-> Bitmap Heap Scan on table2 (cost=15.57..3599.86 rows=904 width=263) (actual time=0.022..0.023 rows=4 loops=2)
Recheck Cond: ((table1_id)::text = (table1.id)::text)
Heap Blocks: exact=3
Buffers: shared hit=11
-> Bitmap Index Scan on table2_table1_id_fk (cost=0.00..15.34 rows=904 width=0) (actual time=0.019..0.019 rows=4 loops=2)
Index Cond: ((table1_id)::text = (table1.id)::text)
Buffers: shared hit=8
Planning:
Buffers: shared hit=416
Planning Time: 1.869 ms
Execution Time: 0.330 ms
Database 2
Gather (cost=1000.57..1801008.91 rows=14 width=740) (actual time=11.580..42863.893 rows=8 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=863 read=631539 dirtied=631979 written=2523
-> Nested Loop (cost=0.56..1800007.51 rows=6 width=740) (actual time=28573.119..42856.696 rows=3 loops=3)
Buffers: shared hit=863 read=631539 dirtied=631979 written=2523
-> Parallel Seq Scan on table1 (cost=0.00..678896.46 rows=1 width=519) (actual time=28573.112..42855.524 rows=1 loops=3)
" Filter: (id = ANY ('{541edffc-7179-42db-8c99-727be8c9ffec,eaac06d3-e44e-4e4a-8e11-1cdc6e562996}'::uuid[]))"
Rows Removed by Filter: 2976413
Buffers: shared hit=854 read=631536 dirtied=631979 written=2523
-> Index Scan using table2_table1_id_fk on table2 (cost=0.56..1117908.70 rows=320236 width=221) (actual time=1.736..1.745 rows=4 loops=2)
Index Cond: (table1_id = table1.id)
Buffers: shared hit=9 read=3
Planning:
Buffers: shared hit=376 read=15
Planning Time: 43.594 ms
Execution Time: 42864.044 ms
Some notes:
The query is orders of magnitude faster in Database 1
Having only one ID in the WHERE clause activates the index in both databases
Casting to ::uuid has no impact
I understand that these results are because the query planner calculates that the cost of the index in the UUID (Database 2) case is too high. But I'm trying to understand why it thinks that and if there's something I can do.

Search results using ts_vectors and ST_Distance

We have a page where we show a list of results, and the results must be relevant given 2 factors:
keyword similarity
location
we are using postgresql postgis and ts_vectors, however, we don't know how to combine the scores coming of ts vectors and st_distance in order to have the "best" search results, the queries seem to be taking between 30 seconds and 1 minute.
SELECT [121/1808]
ts_rank_cd(doc_vectors, plainto_tsquery('Uber '), 1 | 4 | 32) AS rank, ts_headline('english', short_job_description, plainto_tsquery('Uber '), 'MaxWords=80,MinWords=50'),
-- a bunch of fields omitted...
org.logo
FROM jobs.job as job
LEFT OUTER JOIN jobs.organization as org
ON job.organization_id = org.id
WHERE job.is_expired = 0 and deleted_at is NULL and doc_vectors ## plainto_tsquery('Uber ') order by rank desc offset 80 limit 20;
Do you guys have suggestions for us?
EXPLAIN (ANALYZE, BUFFERS) for same Query:
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=886908.73..886908.81 rows=30 width=1108) (actual time=20684.508..20684.518 rows=30 loops=1)
Buffers: shared hit=1584 read=825114
-> Sort (cost=886908.68..889709.48 rows=1120318 width=1108) (actual time=20684.502..20684.509 rows=50 loops=1)
Sort Key: job.created_at DESC
Sort Method: top-N heapsort Memory: 75kB
Buffers: shared hit=1584 read=825114
-> Hash Left Join (cost=421.17..849692.52 rows=1120318 width=1108) (actual time=7.012..18887.816 rows=1111019 loops=1)
Hash Cond: (job.organization_id = org.id)
Buffers: shared hit=1581 read=825114
-> Seq Scan on job (cost=0.00..846329.53 rows=1120318 width=1001) (actual time=0.052..17866.594 rows=1111019 loops=1)
Filter: ((deleted_at IS NULL) AND (is_expired = 0) AND (is_hidden = 0))
Rows Removed by Filter: 196298
Buffers: shared hit=1564 read=824989
-> Hash (cost=264.41..264.41 rows=12541 width=107) (actual time=6.898..6.899 rows=12541 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1037kB
Buffers: shared hit=14 read=125
-> Seq Scan on organization org (cost=0.00..264.41 rows=12541 width=107) (actual time=0.021..3.860 rows=12541 loops=1)
Buffers: shared hit=14 read=125
Planning time: 2.223 ms
Execution time: 20684.682 ms```

Append Cost very high on Partitioned table

I have a query joining two tables partitioned on timestamp column. Both tables are filtered on current date partition. But query is unusually slow with APPEND Cost of the driving table very high.
Query and Plan : https://explain.dalibo.com/plan/wVA
Nested Loop (cost=0.56..174042.82 rows=16 width=494) (actual time=0.482..20.133 rows=1713 loops=1)
Output: tran.transaction_status, mgwor.apx_transaction_id, org.organisation_name, mgwor.order_status, mgwor.request_date, mgwor.response_date, (date_part('epoch'::text, mgwor.response_date) - date_part('epoch'::text, mgwor.request_date))
Buffers: shared hit=5787 dirtied=3
-> Nested Loop (cost=0.42..166837.32 rows=16 width=337) (actual time=0.459..7.803 rows=1713 loops=1)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, org.organisation_name
Join Filter: ((account.account_id)::text = (mgwor.account_id)::text)
Rows Removed by Join Filter: 3007
Buffers: shared hit=589
-> Nested Loop (cost=0.27..40.66 rows=4 width=54) (actual time=0.203..0.483 rows=2 loops=1)
Output: account.account_id, org.organisation_name
Join Filter: ((account.organisation_id)::text = (org.organisation_id)::text)
Rows Removed by Join Filter: 289
Buffers: shared hit=27
-> Index Scan using account_pkey on mdm.account (cost=0.27..32.55 rows=285 width=65) (actual time=0.013..0.122 rows=291 loops=1)
Output: account.account_id, account.account_created_at, account.account_name, account.account_status, account.account_valid_until, account.currency_id, account.organisation_id, account.organisation_psp_id, account."account_threeDS_required", account.account_use_webhook, account.account_webhook_url, account.account_webhook_max_attempt, account.reporting_account_id, account.card_type, account.country_id, account.product_id
Buffers: shared hit=24
-> Materialize (cost=0.00..3.84 rows=1 width=55) (actual time=0.000..0.000 rows=1 loops=291)
Output: org.organisation_name, org.organisation_id
Buffers: shared hit=3
-> Seq Scan on mdm.organisation_smd org (cost=0.00..3.84 rows=1 width=55) (actual time=0.017..0.023 rows=1 loops=1)
Output: org.organisation_name, org.organisation_id
Filter: ((org.organisation_name)::text = 'ABC'::text)
Rows Removed by Filter: 67
Buffers: shared hit=3
-> Materialize (cost=0.15..166576.15 rows=3835 width=473) (actual time=0.127..2.826 rows=2360 loops=2)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id
Buffers: shared hit=562
-> Append (cost=0.15..166556.97 rows=3835 width=473) (actual time=0.252..3.661 rows=2360 loops=1)
Buffers: shared hit=562
Subplans Removed: 1460
-> Bitmap Heap Scan on public.mgworderrequest_part_20200612 mgwor (cost=50.98..672.23 rows=2375 width=91) (actual time=0.251..2.726 rows=2360 loops=1)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id
Recheck Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[]))
Filter: ((mgwor.request_date >= date(now())) AND (mgwor.request_date < (date(now()) + 1)))
Heap Blocks: exact=549
Buffers: shared hit=562
-> Bitmap Index Scan on mgworderrequest_part_20200612_request_type_idx (cost=0.00..50.38 rows=2375 width=0) (actual time=0.191..0.192 rows=2361 loops=1)
Index Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[]))
Buffers: shared hit=13
-> Append (cost=0.14..435.73 rows=1461 width=316) (actual time=0.005..0.006 rows=1 loops=1713)
Buffers: shared hit=5198 dirtied=3
Subplans Removed: 1460
-> Index Scan using transaction_part_20200612_pkey on public.transaction_part_20200612 tran (cost=0.29..0.87 rows=1 width=42) (actual time=0.004..0.005 rows=1 loops=1713)
Output: tran.transaction_status, tran.transaction_id
Index Cond: (((tran.transaction_id)::text = (mgwor.apx_transaction_id)::text) AND (tran.transaction_created_at >= date(now())) AND (tran.transaction_created_at < (date(now()) + 1)))
Filter: (tran.transaction_status IS NOT NULL)
Buffers: shared hit=5198 dirtied=3
Planning Time: 19535.308 ms
Execution Time: 21.006 ms
Partition pruning is working on both the tables.
Am I missing something obvious here?
Thanks,
VA
I don't know why the cost estimate for the append is so large, but presumably you are really worried about how long this takes, not how large the estimate is. As noted, the actual time is going to planning, not to execution.
A likely explanation is that it was waiting on a lock. Time spent waiting on a table lock for a partition table (but not for the parent table) gets attributed to planning time.

PostgreSQL: Sequential scan despite having indexes

I have the following two tables.
person_addresses
address_normalization
The person_addresses table has a field named address_id as the primary key and address_normalization has the corresponding field address_id which has an index on it.
Now, when I explain the following query, I see a sequential scan.
SELECT
count(*)
FROM
mp_member2.person_addresses pa
JOIN mp_member2.address_normalization an ON
an.address_id = pa.address_id
WHERE
an.sr_modification_time >= 1550692189468;
-- Result: 2654
Please refer to the following screenshot.
You see that there is a sequential scan after the hash join. I'm not sure I understand this part; why would a sequential scan follow a hash join.
And as seen in the query above, the set of records returned is also low.
Is this expected behaviour or am I doing something wrong?
Update #1: I also have indices on the sr_modification_time fields of both the tables
Update #2: Full execution plan
Aggregate (cost=206944.74..206944.75 rows=1 width=0) (actual time=2807.844..2807.844 rows=1 loops=1)
Buffers: shared hit=4629 read=82217
-> Hash Join (cost=2881.95..206825.15 rows=47836 width=0) (actual time=0.775..2807.160 rows=2654 loops=1)
Hash Cond: (pa.address_id = an.address_id)
Buffers: shared hit=4629 read=82217
-> Seq Scan on person_addresses pa (cost=0.00..135924.93 rows=4911993 width=8) (actual time=0.005..1374.610 rows=4911993 loops=1)
Buffers: shared hit=4588 read=82217
-> Hash (cost=2432.05..2432.05 rows=35992 width=18) (actual time=0.756..0.756 rows=1005 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 41kB
Buffers: shared hit=41
-> Index Scan using mp_member2_address_normalization_mod_time on address_normalization an (cost=0.43..2432.05 rows=35992 width=18) (actual time=0.012..0.424 rows=1005 loops=1)
Index Cond: (sr_modification_time >= 1550692189468::bigint)
Buffers: shared hit=41
Planning time: 0.244 ms
Execution time: 2807.885 ms
Update #3: I tried with a newer timestamp and it used an index scan.
EXPLAIN (
ANALYZE
, buffers
, format TEXT
) SELECT
COUNT(*)
FROM
mp_member2.person_addresses pa
JOIN mp_member2.address_normalization an ON
an.address_id = pa.address_id
WHERE
an.sr_modification_time >= 1557507300342;
-- count: 1364
Query Plan:
Aggregate (cost=295.48..295.49 rows=1 width=0) (actual time=2.770..2.770 rows=1 loops=1)
Buffers: shared hit=1404
-> Nested Loop (cost=4.89..295.43 rows=19 width=0) (actual time=0.038..2.491 rows=1364 loops=1)
Buffers: shared hit=1404
-> Index Scan using mp_member2_address_normalization_mod_time on address_normalization an (cost=0.43..8.82 rows=14 width=18) (actual time=0.009..0.142 rows=341 loops=1)
Index Cond: (sr_modification_time >= 1557507300342::bigint)
Buffers: shared hit=14
-> Bitmap Heap Scan on person_addresses pa (cost=4.46..20.43 rows=4 width=8) (actual time=0.004..0.005 rows=4 loops=341)
Recheck Cond: (address_id = an.address_id)
Heap Blocks: exact=360
Buffers: shared hit=1390
-> Bitmap Index Scan on idx_mp_member2_person_addresses_address_id (cost=0.00..4.46 rows=4 width=0) (actual time=0.003..0.003 rows=4 loops=341)
Index Cond: (address_id = an.address_id)
Buffers: shared hit=1030
Planning time: 0.214 ms
Execution time: 2.816 ms
That is the expected behavior because you don't have index for sr_modification_time so after create the hash join db has to scan the whole set to check each row for the sr_modification_time value
You should create:
index for (sr_modification_time)
or composite index for (address_id , sr_modification_time )

EXPLAIN ANALYZE in PostgreSQL : impact on the query performance?

Every time I run an EXPLAIN ANALYZE over a query in PostgreSQL, Execution time decreases. Why?
I need to do some indexing on the table and this way I can't be sure if my actions which will enhance performance. What do you recommend me?
Example of the result of successive execution of the query explain (analyze, buffers) :
my_db=# explain (analyze, buffers)
SELECT COUNT(*) AS count
FROM my_view
WHERE creation_time >= '2019-01-18 00:00:00'
AND creation_time <= '2019-01-18 09:43:36'
ORDER BY count DESC
LIMIT 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
Limit (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=24380.613..24380.614 rows=1 loops=1)
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Sort (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=24380.611..24380.611 rows=1 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Aggregate (cost=2568854.21..2568854.22 rows=1 width=8) (actual time=24380.599..24380.599 rows=1 loops=1)
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> GroupAggregate (cost=2568854.18..2568854.20 rows=1 width=28) (actual time=24339.455..24380.589 rows=40 loops=1)
Group Key: my_table.creation_time, my_table.some_field
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Sort (cost=2568854.18..2568854.18 rows=1 width=12) (actual time=24338.361..24357.171 rows=199309 loops=1)
Sort Key: my_table.creation_time, my_table.some_field
Sort Method: external merge Disk: 4496kB
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Gather (cost=1000.00..2568854.17 rows=1 width=12) (actual time=23799.237..24142.217 rows=199309 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=14915 read=2052993
-> Parallel Seq Scan on my_table (cost=0.00..2567854.07 rows=1 width=12) (actual time=23796.695..24087.574 rows=66436 loops=3)
Filter: (creation_time >= '2019-01-18 00:00:00+00'::timestamp with time zone) AND (creation_time <= '2019-01-18 09:43:36+00'::timestamp with time zone)
Rows Removed by Filter: 21818095
Buffers: shared hit=14915 read=2052993
Planning time: 10.982 ms
Execution time: 24381.544 ms
(25 rows)
my_db=# explain (analyze, buffers)
SELECT COUNT(*) AS count
FROM my_view
WHERE creation_time >= '2019-01-18 00:00:00'
AND creation_time <= '2019-01-18 09:43:36'
ORDER BY count DESC
LIMIT 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
Limit (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=6836.247..6836.248 rows=1 loops=1)
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Sort (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=6836.245..6836.246 rows=1 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Aggregate (cost=2568854.21..2568854.22 rows=1 width=8) (actual time=6836.232..6836.232 rows=1 loops=1)
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> GroupAggregate (cost=2568854.18..2568854.20 rows=1 width=28) (actual time=6792.036..6836.221 rows=40 loops=1)
Group Key: my_table.creation_time, my_table.some_field
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Sort (cost=2568854.18..2568854.18 rows=1 width=12) (actual time=6790.807..6811.469 rows=199309 loops=1)
Sort Key: my_table.creation_time, my_table.some_field
Sort Method: external merge Disk: 4496kB
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Gather (cost=1000.00..2568854.17 rows=1 width=12) (actual time=6271.571..6604.946 rows=199309 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=15181 read=2052727
-> Parallel Seq Scan on my_table (cost=0.00..2567854.07 rows=1 width=12) (actual time=6268.383..6529.416 rows=66436 loops=3)
Filter: (creation_time >= '2019-01-18 00:00:00+00'::timestamp with time zone) AND (creation_time <= '2019-01-18 09:43:36+00'::timestamp with time zone)
Rows Removed by Filter: 21818095
Buffers: shared hit=15181 read=2052727
Planning time: 0.570 ms
Execution time: 6837.137 ms
(25 rows)
Thank you.
What version of PostgreSQL is this?
There are two problems:
A missing index:
CREATE INDEX ON my_table (creation_time);
The value distribution statistics for creation time are way off, which causes PostgreSQL to underestimate the number of result rows terribly.
This is not the immediate cause of your problem, but you should investigate what is going on there:
If ANALYZE my_table improves the estimate, see that autoanalyze runs more often for that table.
If ANALYZE my_table does not help, collect better statistics:
ALTER TABLE my_table ALTER creation_time SET STATISTICS 1000;
ANALYZE my_table;
The reduced execution time you observe with EXPLAIN (ANALYZE) is probably due to caching effects.