EXPLAIN ANALYZE in PostgreSQL : impact on the query performance? - postgresql

Every time I run an EXPLAIN ANALYZE over a query in PostgreSQL, Execution time decreases. Why?
I need to do some indexing on the table and this way I can't be sure if my actions which will enhance performance. What do you recommend me?
Example of the result of successive execution of the query explain (analyze, buffers) :
my_db=# explain (analyze, buffers)
SELECT COUNT(*) AS count
FROM my_view
WHERE creation_time >= '2019-01-18 00:00:00'
AND creation_time <= '2019-01-18 09:43:36'
ORDER BY count DESC
LIMIT 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
Limit (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=24380.613..24380.614 rows=1 loops=1)
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Sort (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=24380.611..24380.611 rows=1 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Aggregate (cost=2568854.21..2568854.22 rows=1 width=8) (actual time=24380.599..24380.599 rows=1 loops=1)
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> GroupAggregate (cost=2568854.18..2568854.20 rows=1 width=28) (actual time=24339.455..24380.589 rows=40 loops=1)
Group Key: my_table.creation_time, my_table.some_field
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Sort (cost=2568854.18..2568854.18 rows=1 width=12) (actual time=24338.361..24357.171 rows=199309 loops=1)
Sort Key: my_table.creation_time, my_table.some_field
Sort Method: external merge Disk: 4496kB
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Gather (cost=1000.00..2568854.17 rows=1 width=12) (actual time=23799.237..24142.217 rows=199309 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=14915 read=2052993
-> Parallel Seq Scan on my_table (cost=0.00..2567854.07 rows=1 width=12) (actual time=23796.695..24087.574 rows=66436 loops=3)
Filter: (creation_time >= '2019-01-18 00:00:00+00'::timestamp with time zone) AND (creation_time <= '2019-01-18 09:43:36+00'::timestamp with time zone)
Rows Removed by Filter: 21818095
Buffers: shared hit=14915 read=2052993
Planning time: 10.982 ms
Execution time: 24381.544 ms
(25 rows)
my_db=# explain (analyze, buffers)
SELECT COUNT(*) AS count
FROM my_view
WHERE creation_time >= '2019-01-18 00:00:00'
AND creation_time <= '2019-01-18 09:43:36'
ORDER BY count DESC
LIMIT 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
Limit (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=6836.247..6836.248 rows=1 loops=1)
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Sort (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=6836.245..6836.246 rows=1 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Aggregate (cost=2568854.21..2568854.22 rows=1 width=8) (actual time=6836.232..6836.232 rows=1 loops=1)
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> GroupAggregate (cost=2568854.18..2568854.20 rows=1 width=28) (actual time=6792.036..6836.221 rows=40 loops=1)
Group Key: my_table.creation_time, my_table.some_field
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Sort (cost=2568854.18..2568854.18 rows=1 width=12) (actual time=6790.807..6811.469 rows=199309 loops=1)
Sort Key: my_table.creation_time, my_table.some_field
Sort Method: external merge Disk: 4496kB
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Gather (cost=1000.00..2568854.17 rows=1 width=12) (actual time=6271.571..6604.946 rows=199309 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=15181 read=2052727
-> Parallel Seq Scan on my_table (cost=0.00..2567854.07 rows=1 width=12) (actual time=6268.383..6529.416 rows=66436 loops=3)
Filter: (creation_time >= '2019-01-18 00:00:00+00'::timestamp with time zone) AND (creation_time <= '2019-01-18 09:43:36+00'::timestamp with time zone)
Rows Removed by Filter: 21818095
Buffers: shared hit=15181 read=2052727
Planning time: 0.570 ms
Execution time: 6837.137 ms
(25 rows)
Thank you.

What version of PostgreSQL is this?
There are two problems:
A missing index:
CREATE INDEX ON my_table (creation_time);
The value distribution statistics for creation time are way off, which causes PostgreSQL to underestimate the number of result rows terribly.
This is not the immediate cause of your problem, but you should investigate what is going on there:
If ANALYZE my_table improves the estimate, see that autoanalyze runs more often for that table.
If ANALYZE my_table does not help, collect better statistics:
ALTER TABLE my_table ALTER creation_time SET STATISTICS 1000;
ANALYZE my_table;
The reduced execution time you observe with EXPLAIN (ANALYZE) is probably due to caching effects.

Related

Join on UUID column not using index for multiple values

I have a PostgreSQL database that I cloned.
Database 1 has varchar(36) as primary keys
Database 2 (the clone) has UUID as primary keys.
Both contain the same data. What I don't understand is why queries on Database 1 will use the index but Database 2 will not. Here's the query:
EXPLAIN (ANALYZE, BUFFERS)
select * from table1
INNER JOIN table2 on table1.id = table2.table1_id
where table1.id in (
'541edffc-7179-42db-8c99-727be8c9ffec',
'eaac06d3-e44e-4e4a-8e11-1cdc6e562996'
);
Database 1
Nested Loop (cost=16.13..7234.96 rows=14 width=803) (actual time=0.072..0.112 rows=8 loops=1)
Buffers: shared hit=23
-> Index Scan using table1_pk on table1 (cost=0.56..17.15 rows=2 width=540) (actual time=0.042..0.054 rows=2 loops=1)
" Index Cond: ((id)::text = ANY ('{541edffc-7179-42db-8c99-727be8c9ffec,eaac06d3-e44e-4e4a-8e11-1cdc6e562996}'::text[]))"
Buffers: shared hit=12
-> Bitmap Heap Scan on table2 (cost=15.57..3599.86 rows=904 width=263) (actual time=0.022..0.023 rows=4 loops=2)
Recheck Cond: ((table1_id)::text = (table1.id)::text)
Heap Blocks: exact=3
Buffers: shared hit=11
-> Bitmap Index Scan on table2_table1_id_fk (cost=0.00..15.34 rows=904 width=0) (actual time=0.019..0.019 rows=4 loops=2)
Index Cond: ((table1_id)::text = (table1.id)::text)
Buffers: shared hit=8
Planning:
Buffers: shared hit=416
Planning Time: 1.869 ms
Execution Time: 0.330 ms
Database 2
Gather (cost=1000.57..1801008.91 rows=14 width=740) (actual time=11.580..42863.893 rows=8 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=863 read=631539 dirtied=631979 written=2523
-> Nested Loop (cost=0.56..1800007.51 rows=6 width=740) (actual time=28573.119..42856.696 rows=3 loops=3)
Buffers: shared hit=863 read=631539 dirtied=631979 written=2523
-> Parallel Seq Scan on table1 (cost=0.00..678896.46 rows=1 width=519) (actual time=28573.112..42855.524 rows=1 loops=3)
" Filter: (id = ANY ('{541edffc-7179-42db-8c99-727be8c9ffec,eaac06d3-e44e-4e4a-8e11-1cdc6e562996}'::uuid[]))"
Rows Removed by Filter: 2976413
Buffers: shared hit=854 read=631536 dirtied=631979 written=2523
-> Index Scan using table2_table1_id_fk on table2 (cost=0.56..1117908.70 rows=320236 width=221) (actual time=1.736..1.745 rows=4 loops=2)
Index Cond: (table1_id = table1.id)
Buffers: shared hit=9 read=3
Planning:
Buffers: shared hit=376 read=15
Planning Time: 43.594 ms
Execution Time: 42864.044 ms
Some notes:
The query is orders of magnitude faster in Database 1
Having only one ID in the WHERE clause activates the index in both databases
Casting to ::uuid has no impact
I understand that these results are because the query planner calculates that the cost of the index in the UUID (Database 2) case is too high. But I'm trying to understand why it thinks that and if there's something I can do.

Append Cost very high on Partitioned table

I have a query joining two tables partitioned on timestamp column. Both tables are filtered on current date partition. But query is unusually slow with APPEND Cost of the driving table very high.
Query and Plan : https://explain.dalibo.com/plan/wVA
Nested Loop (cost=0.56..174042.82 rows=16 width=494) (actual time=0.482..20.133 rows=1713 loops=1)
Output: tran.transaction_status, mgwor.apx_transaction_id, org.organisation_name, mgwor.order_status, mgwor.request_date, mgwor.response_date, (date_part('epoch'::text, mgwor.response_date) - date_part('epoch'::text, mgwor.request_date))
Buffers: shared hit=5787 dirtied=3
-> Nested Loop (cost=0.42..166837.32 rows=16 width=337) (actual time=0.459..7.803 rows=1713 loops=1)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, org.organisation_name
Join Filter: ((account.account_id)::text = (mgwor.account_id)::text)
Rows Removed by Join Filter: 3007
Buffers: shared hit=589
-> Nested Loop (cost=0.27..40.66 rows=4 width=54) (actual time=0.203..0.483 rows=2 loops=1)
Output: account.account_id, org.organisation_name
Join Filter: ((account.organisation_id)::text = (org.organisation_id)::text)
Rows Removed by Join Filter: 289
Buffers: shared hit=27
-> Index Scan using account_pkey on mdm.account (cost=0.27..32.55 rows=285 width=65) (actual time=0.013..0.122 rows=291 loops=1)
Output: account.account_id, account.account_created_at, account.account_name, account.account_status, account.account_valid_until, account.currency_id, account.organisation_id, account.organisation_psp_id, account."account_threeDS_required", account.account_use_webhook, account.account_webhook_url, account.account_webhook_max_attempt, account.reporting_account_id, account.card_type, account.country_id, account.product_id
Buffers: shared hit=24
-> Materialize (cost=0.00..3.84 rows=1 width=55) (actual time=0.000..0.000 rows=1 loops=291)
Output: org.organisation_name, org.organisation_id
Buffers: shared hit=3
-> Seq Scan on mdm.organisation_smd org (cost=0.00..3.84 rows=1 width=55) (actual time=0.017..0.023 rows=1 loops=1)
Output: org.organisation_name, org.organisation_id
Filter: ((org.organisation_name)::text = 'ABC'::text)
Rows Removed by Filter: 67
Buffers: shared hit=3
-> Materialize (cost=0.15..166576.15 rows=3835 width=473) (actual time=0.127..2.826 rows=2360 loops=2)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id
Buffers: shared hit=562
-> Append (cost=0.15..166556.97 rows=3835 width=473) (actual time=0.252..3.661 rows=2360 loops=1)
Buffers: shared hit=562
Subplans Removed: 1460
-> Bitmap Heap Scan on public.mgworderrequest_part_20200612 mgwor (cost=50.98..672.23 rows=2375 width=91) (actual time=0.251..2.726 rows=2360 loops=1)
Output: mgwor.apx_transaction_id, mgwor.order_status, mgwor.request_date, mgwor.response_date, mgwor.account_id
Recheck Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[]))
Filter: ((mgwor.request_date >= date(now())) AND (mgwor.request_date < (date(now()) + 1)))
Heap Blocks: exact=549
Buffers: shared hit=562
-> Bitmap Index Scan on mgworderrequest_part_20200612_request_type_idx (cost=0.00..50.38 rows=2375 width=0) (actual time=0.191..0.192 rows=2361 loops=1)
Index Cond: ((mgwor.request_type)::text = ANY ('{CARD,CARD_PAYMENT}'::text[]))
Buffers: shared hit=13
-> Append (cost=0.14..435.73 rows=1461 width=316) (actual time=0.005..0.006 rows=1 loops=1713)
Buffers: shared hit=5198 dirtied=3
Subplans Removed: 1460
-> Index Scan using transaction_part_20200612_pkey on public.transaction_part_20200612 tran (cost=0.29..0.87 rows=1 width=42) (actual time=0.004..0.005 rows=1 loops=1713)
Output: tran.transaction_status, tran.transaction_id
Index Cond: (((tran.transaction_id)::text = (mgwor.apx_transaction_id)::text) AND (tran.transaction_created_at >= date(now())) AND (tran.transaction_created_at < (date(now()) + 1)))
Filter: (tran.transaction_status IS NOT NULL)
Buffers: shared hit=5198 dirtied=3
Planning Time: 19535.308 ms
Execution Time: 21.006 ms
Partition pruning is working on both the tables.
Am I missing something obvious here?
Thanks,
VA
I don't know why the cost estimate for the append is so large, but presumably you are really worried about how long this takes, not how large the estimate is. As noted, the actual time is going to planning, not to execution.
A likely explanation is that it was waiting on a lock. Time spent waiting on a table lock for a partition table (but not for the parent table) gets attributed to planning time.

Partition and Indexes

I have a table partitioned for every quarter. Table name is data. In table there is couple of columns but also date. date is a field which has index on it created:
create index on data (date);
Now I am trying to querying the table:
justpremium=> EXPLAIN analyze SELECT sum(col_1) FROM data WHERE "date" BETWEEN '2018-12-01' AND '2018-12-31';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=355709.66..355709.67 rows=1 width=32) (actual time=577.072..577.072 rows=1 loops=1)
-> Gather (cost=355709.44..355709.65 rows=2 width=32) (actual time=577.005..578.418 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=354709.44..354709.45 rows=1 width=32) (actual time=573.255..573.256 rows=1 loops=3)
-> Append (cost=0.42..352031.07 rows=1071346 width=8) (actual time=15.286..524.604 rows=837204 loops=3)
-> Parallel Index Scan using data_date_idx on data (cost=0.42..8.44 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=3)
Index Cond: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
-> Parallel Seq Scan on data_y2018q4 (cost=0.00..352022.64 rows=1071345 width=8) (actual time=15.282..465.859 rows=837204 loops=3)
Filter: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
Rows Removed by Filter: 1479844
Planning time: 1.437 ms
Execution time: 578.465 ms
(13 rows)
We may see that there is Parallel Seq Scan on data_y2018q4. In fact it is normal to me. I have one quarter partition. I am querying third part of the whole partition, so I have seq scan, great.
But now let's query directly partition table:
justpremium=> EXPLAIN analyze SELECT sum(col_1) FROM data_y2018q4 WHERE "date" BETWEEN '2018-12-01' AND '2018-12-31';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=286475.38..286475.39 rows=1 width=32) (actual time=277.830..277.830 rows=1 loops=1)
-> Gather (cost=286475.16..286475.37 rows=2 width=32) (actual time=277.760..279.194 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=285475.16..285475.17 rows=1 width=32) (actual time=275.950..275.950 rows=1 loops=3)
-> Parallel Index Scan using data_y2018q4_date_idx on data_y2018q4 (cost=0.43..282796.80 rows=1071345 width=8) (actual time=0.022..227.687 rows=837204 loops=3)
Index Cond: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
Planning time: 0.187 ms
Execution time: 279.233 ms
(9 rows)
Now I have Index Scan using data_y2018q4_date_idx and also time of whole query is two times quicker: 279.233 ms compared to 578.465 ms. What is the explanation of this? How force planner to use the index scan when querying data table. How to achieve two times better timing?

postgres window function trebles query time

I'm using postgres 10, and have the following query
select
count(task.id) over() as _total_ ,
json_agg(u.*) as users,
task.*
from task
left outer join taskuserlink_history tu on (task.id = tu.taskid)
left outer join "user" u on (tu.userId = u.id)
group by task.id offset 10 limit 10;
this query takes approx 800ms to execute
if I remove the count(task.id) over() as _total_ , line, then it executes in 250ms
I have to confess being a complete sql noob, so the query itself may be completely borked
I was wondering if anyone could point to the flaws in the query, and make suggestions on how to speed it up.
The number of tasks is approx 15k, with an average of 5 users per task, linked through taskuserlink
I have looked at the pgadmin "explain" diagram
but to be honest can't really figure it out yet ;)
the table definitions are
task , with id (int) as primary column
taskuserlink_history, with taskId (int) and userId (int) (both as foreign key constraints, indexed)
user, with id (int) as primary column
the query plan is as follows
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.74..12.49 rows=10 width=44) (actual time=1178.016..1178.043 rows=10 loops=1)
Buffers: shared hit=3731, temp read=6655 written=6914
-> WindowAgg (cost=4.74..10248.90 rows=13231 width=44) (actual time=1178.014..1178.040 rows=10 loops=1)
Buffers: shared hit=3731, temp read=6655 written=6914
-> GroupAggregate (cost=4.74..10083.51 rows=13231 width=36) (actual time=0.417..1049.294 rows=13255 loops=1)
Group Key: task.id
Buffers: shared hit=3731
-> Nested Loop Left Join (cost=4.74..9586.77 rows=66271 width=36) (actual time=0.103..309.372 rows=66162 loops=1)
Join Filter: (taskuserlink_history.userid = user_archive.id)
Rows Removed by Join Filter: 1182904
Buffers: shared hit=3731
-> Merge Left Join (cost=0.58..5563.22 rows=66271 width=8) (actual time=0.044..73.598 rows=66162 loops=1)
Merge Cond: (task.id = taskuserlink_history.taskid)
Buffers: shared hit=3629
-> Index Only Scan using task_pkey on task (cost=0.29..1938.30 rows=13231 width=4) (actual time=0.026..7.683 rows=13255 loops=1)
Heap Fetches: 13255
Buffers: shared hit=1810
-> Index Scan using taskuserlink_history_task_fk_idx on taskuserlink_history (cost=0.29..2764.46 rows=66271 width=8) (actual time=0.015..40.109 rows=66162 loops=1)
Filter: (timeend IS NULL)
Rows Removed by Filter: 13368
Buffers: shared hit=1819
-> Materialize (cost=4.17..50.46 rows=4 width=36) (actual time=0.000..0.001 rows=19 loops=66162)
Buffers: shared hit=102
-> Bitmap Heap Scan on user_archive (cost=4.17..50.44 rows=4 width=36) (actual time=0.050..0.305 rows=45 loops=1)
Recheck Cond: (archived_at IS NULL)
Heap Blocks: exact=11
Buffers: shared hit=102
-> Bitmap Index Scan on user_unique_username (cost=0.00..4.16 rows=4 width=0) (actual time=0.014..0.014 rows=46 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Aggregate (cost=8.30..8.31 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=45)
Buffers: shared hit=90
-> Index Scan using task_assignedto_idx on task task_1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=45)
Index Cond: (assignedtoid = user_archive.id)
Buffers: shared hit=90
Planning time: 0.989 ms
Execution time: 1191.451 ms
(37 rows)
without the window function it is
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.74..12.36 rows=10 width=36) (actual time=0.510..1.763 rows=10 loops=1)
Buffers: shared hit=91
-> GroupAggregate (cost=4.74..10083.51 rows=13231 width=36) (actual time=0.509..1.759 rows=10 loops=1)
Group Key: task.id
Buffers: shared hit=91
-> Nested Loop Left Join (cost=4.74..9586.77 rows=66271 width=36) (actual time=0.073..0.744 rows=50 loops=1)
Join Filter: (taskuserlink_history.userid = user_archive.id)
Rows Removed by Join Filter: 361
Buffers: shared hit=91
-> Merge Left Join (cost=0.58..5563.22 rows=66271 width=8) (actual time=0.029..0.161 rows=50 loops=1)
Merge Cond: (task.id = taskuserlink_history.taskid)
Buffers: shared hit=7
-> Index Only Scan using task_pkey on task (cost=0.29..1938.30 rows=13231 width=4) (actual time=0.016..0.031 rows=11 loops=1)
Heap Fetches: 11
Buffers: shared hit=4
-> Index Scan using taskuserlink_history_task_fk_idx on taskuserlink_history (cost=0.29..2764.46 rows=66271 width=8) (actual time=0.009..0.081 rows=50 loops=1)
Filter: (timeend IS NULL)
Rows Removed by Filter: 11
Buffers: shared hit=3
-> Materialize (cost=4.17..50.46 rows=4 width=36) (actual time=0.001..0.009 rows=8 loops=50)
Buffers: shared hit=84
-> Bitmap Heap Scan on user_archive (cost=4.17..50.44 rows=4 width=36) (actual time=0.040..0.382 rows=38 loops=1)
Recheck Cond: (archived_at IS NULL)
Heap Blocks: exact=7
Buffers: shared hit=84
-> Bitmap Index Scan on user_unique_username (cost=0.00..4.16 rows=4 width=0) (actual time=0.012..0.012 rows=46 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Aggregate (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=38)
Buffers: shared hit=76
-> Index Scan using task_assignedto_idx on task task_1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=38)
Index Cond: (assignedtoid = user_archive.id)
Buffers: shared hit=76
Planning time: 0.895 ms
Execution time: 1.890 ms
(35 rows)|
I believe the LIMIT clause is making the difference. LIMIT is limiting the number of rows returned, not neccessarily the work involved:
Your second query can be aborted early after 20 rows have been constructed (10 for OFFSET and 10 for LIMIT).
However, your first query needs to go through the whole set to calculate the count(task.id).
Not what you were asking, but I say it anyway:
"user" is not a table, but a view. That is were both queries actually get slower than they should be (The "Materialize" in the plan).
Using OFFSET for paging calls for trouble because it will get slow when the OFFSET increases
Using OFFSET and LIMIT without an ORDER BY is most likely not what you want. The result sets might not be identical on consecutive calls.

PostgreSQL 9.4 Index speed very slow on Select

The box running PostgreSQL 9.4 is a 32 core system at 3.5ghz per core with 128GB or ram with mirrored Samsung pro 850 SSD drives on FreeBSD with ZFS. This is no reason for this poor of performance from PostgreSQL!
psql (9.4.5)
forex=# \d pair_data;
Table "public.pair_data"
Column | Type | Modifiers
------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('pair_data_id_seq'::regclass)
pair_name | character varying(6) |
pair_price | numeric(9,6) |
ts | timestamp without time zone |
Indexes:
"pair_data_pkey" PRIMARY KEY, btree (id)
"date_idx" gin (ts)
"pair_data_gin_idx1" gin (pair_name)
"pair_data_gin_idx2" gin (ts)
With this select:
forex=# explain (analyze, buffers) select pair_name as name, pair_price as price, ts as date from pair_data where pair_name = 'EURUSD' order by ts desc limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=201442.55..201442.57 rows=10 width=22) (actual time=10870.395..10870.430 rows=10 loops=1)
Buffers: shared hit=40 read=139150 dirtied=119
-> Sort (cost=201442.55..203787.44 rows=937957 width=22) (actual time=10870.390..10870.401 rows=10 loops=1)
Sort Key: ts
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=40 read=139150 dirtied=119
-> Bitmap Heap Scan on pair_data (cost=9069.17..181173.63 rows=937957 width=22) (actual time=614.976..8903.913 rows=951858 loops=1)
Recheck Cond: ((pair_name)::text = 'EURUSD'::text)
Rows Removed by Index Recheck: 13624055
Heap Blocks: exact=33464 lossy=105456
Buffers: shared hit=40 read=139150 dirtied=119
-> Bitmap Index Scan on pair_data_gin_idx1 (cost=0.00..8834.68 rows=937957 width=0) (actual time=593.701..593.701 rows=951858 loops=1)
Index Cond: ((pair_name)::text = 'EURUSD'::text)
Buffers: shared hit=40 read=230
Planning time: 0.387 ms
Execution time: 10871.419 ms
(16 rows)
Or this select:
forex=# explain (analyze, buffers) with intervals as ( select start, start + interval '4hr' as end from generate_series('2015-12-01 15:00', '2016-01-19 16:00', interval '4hr') as start ) select distinct intervals.start as date, min(pair_price) over w as low, max(pair_price) over w as high, first_value(pair_price) over w as open, last_value(pair_price) over w as close from intervals join pair_data mb on mb.pair_name = 'EURUSD' and mb.ts >= intervals.start and mb.ts < intervals.end window w as (partition by intervals.start order by mb.ts asc rows between unbounded preceding and unbounded following) order by intervals.start;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=64634864.43..66198331.09 rows=815054 width=23) (actual time=1379732.924..1384602.952 rows=82 loops=1)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3332596 written=17050
CTE intervals
-> Function Scan on generate_series start (cost=0.00..12.50 rows=1000 width=8) (actual time=0.135..1.801 rows=295 loops=1)
-> Sort (cost=64634851.92..64895429.70 rows=104231111 width=23) (actual time=1379732.918..1381724.179 rows=951970 loops=1)
Sort Key: intervals.start, (min(mb.pair_price) OVER (?)), (max(mb.pair_price) OVER (?)), (first_value(mb.pair_price) OVER (?)), (last_value(mb.pair_price) OVER (?))
Sort Method: external sort Disk: 60808kB
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3332596 written=17050
-> WindowAgg (cost=41474743.35..44341098.90 rows=104231111 width=23) (actual time=1341744.405..1365946.672 rows=951970 loops=1)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3324995 written=9449
-> Sort (cost=41474743.35..41735321.12 rows=104231111 width=23) (actual time=1341743.502..1343723.884 rows=951970 loops=1)
Sort Key: intervals.start, mb.ts
Sort Method: external sort Disk: 32496kB
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1154778 written=7975
-> Nested Loop (cost=9098.12..21180990.32 rows=104231111 width=23) (actual time=271672.696..1337526.628 rows=951970 loops=1)
Join Filter: ((mb.ts >= intervals.start) AND (mb.ts < intervals."end"))
Rows Removed by Join Filter: 279879180
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1150716 written=3913
-> CTE Scan on intervals (cost=0.00..20.00 rows=1000 width=16) (actual time=0.142..4.075 rows=295 loops=1)
-> Materialize (cost=9098.12..190496.52 rows=938080 width=15) (actual time=2.125..1962.153 rows=951970 loops=295)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1150716 written=3913
-> Bitmap Heap Scan on pair_data mb (cost=9098.12..181225.12 rows=938080 width=15) (actual time=622.818..11474.987 rows=951970 loops=1)
Recheck Cond: ((pair_name)::text = 'EURUSD'::text)
Rows Removed by Index Recheck: 13623989
Heap Blocks: exact=33485 lossy=105456
Buffers: shared hit=8 read=139210 dirtied=22
-> Bitmap Index Scan on pair_data_gin_idx1 (cost=0.00..8863.60 rows=938080 width=0) (actual time=601.158..601.158 rows=951970 loops=1)
Index Cond: ((pair_name)::text = 'EURUSD'::text)
Buffers: shared hit=8 read=269
Planning time: 0.454 ms
Execution time: 1384653.385 ms
(31 rows)
Table of pair_data only has:
forex=# select count(*) from pair_data;
count
----------
21833886
(1 row)
Why is this doing heap scans when their are indexes? I do not understand what is going on with the query plan? Dose anyone have an idea on where the problem might be?