Indexes on table:
create index shifts_start_at_idx
on shifts (start_at);
Query 1 with at time zone:
SELECT shifts.id
FROM shifts
JOIN stores ON shifts.store_id = stores.id AND stores.deleted_at IS NULL
JOIN cities ON stores.city_id = cities.id
WHERE TRUE
AND (shifts.start_at >= '2022-05-06 03:00:00'::timestamp AT TIME ZONE
(EXTRACT(timezone FROM cities.time_zone) * INTERVAL '1 second'))
ORDER BY shifts.start_at DESC, shifts.end_at DESC, shifts.id DESC
LIMIT 100;
Explain query 1:
Limit (cost=0.86..298.93 rows=100 width=24) (actual time=0.143..25.257 rows=100 loops=1)
-> Nested Loop (cost=0.86..1485256.59 rows=498300 width=24) (actual time=0.131..23.317 rows=100 loops=1)
" Join Filter: (shifts.start_at >= timezone((date_part('timezone'::text, cities.time_zone) * '00:00:01'::interval), '2022-05-06 03:00:00'::timestamp without time zone))"
-> Nested Loop (cost=0.72..1209695.67 rows=1494900 width=32) (actual time=0.096..17.621 rows=100 loops=1)
-> Index Scan Backward using shifts_admin_order_by_idx on shifts (cost=0.43..291132.79 rows=3000000 width=32) (actual time=0.036..6.780 rows=205 loops=1)
-> Index Scan using stores_id_deleted_at_null_idx on stores (cost=0.29..0.31 rows=1 width=16) (actual time=0.025..0.025 rows=0 loops=205)
Index Cond: (id = shifts.store_id)
-> Index Scan using cities_pkey on cities (cost=0.14..0.16 rows=1 width=20) (actual time=0.017..0.017 rows=1 loops=100)
Index Cond: (id = stores.city_id)
Planning Time: 0.632 ms
Execution Time: 26.436 ms
Postgres doesn't use index
Query 2 without at time zone:
SELECT shifts.id
FROM shifts
JOIN stores ON shifts.store_id = stores.id AND stores.deleted_at IS NULL
JOIN cities ON stores.city_id = cities.id
WHERE TRUE
AND (shifts.start_at >= '2022-05-06 03:00:00')
ORDER BY shifts.start_at DESC, shifts.end_at DESC, shifts.id DESC
LIMIT 100;
Explain query 2:
Limit (cost=0.86..108.84 rows=100 width=24) (actual time=0.125..8.866 rows=100 loops=1)
-> Nested Loop (cost=0.86..898691.17 rows=832261 width=24) (actual time=0.115..7.886 rows=100 loops=1)
-> Nested Loop (cost=0.72..761958.37 rows=832261 width=32) (actual time=0.066..5.570 rows=100 loops=1)
-> Index Scan Backward using shifts_admin_order_by_idx on shifts (cost=0.43..248984.02 rows=1670200 width=32) (actual time=0.014..1.380 rows=205 loops=1)
Index Cond: (start_at >= '2022-05-06 03:00:00+00'::timestamp with time zone)
-> Index Scan using stores_id_deleted_at_null_idx on stores (cost=0.29..0.31 rows=1 width=16) (actual time=0.008..0.008 rows=0 loops=205)
Index Cond: (id = shifts.store_id)
-> Index Only Scan using cities_pkey on cities (cost=0.14..0.16 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=100)
Index Cond: (id = stores.city_id)
Heap Fetches: 100
Planning Time: 0.327 ms
Execution Time: 9.394 ms
It is not entirely clear why it does not want to use the index when converting the time to a time format with a timezone
Related
I have created a dummy panda table with 1.000.000 rows of random data. Table has id primary key column of uuid type.
Using prisma I do a cursor based pagination forward and backward for the same id cursor. Forward generated query takes 900ms (~1s), backward takes 0.06ms (~0.006s)
Why first is slower and is there a way to improve it? Strange thing is that if I replace "order_cmp"."Panda_id_0" with actual value (second example) performance will be good again 0.06ms (~0.006s).
Forward:
explain analyze
select * from
"public"."panda",
(
select
"public"."panda"."id" as "Panda_id_0"
from
"public"."panda"
where
("public"."panda"."id") = ('ffff581b-ef0c-4cb8-9ece-25504e420348')) as "order_cmp"
where
"public"."panda"."id" >= "order_cmp"."Panda_id_0"
order by
"public"."panda"."id" asc
limit 4 offset 1;
Limit (cost=0.98..1.52 rows=4 width=86) (actual time=919.298..919.303 rows=4 loops=1)
-> Nested Loop (cost=0.85..44539.57 rows=333333 width=86) (actual time=919.295..919.301 rows=5 loops=1)
Join Filter: (panda.id >= panda_1.id)
Rows Removed by Join Filter: 999990
-> Index Scan using panda_pkey on panda (cost=0.42..29536.92 rows=1000000 width=70) (actual time=0.017..607.298 rows=999995 loops=1)
-> Materialize (cost=0.42..2.65 rows=1 width=16) (actual time=0.000..0.000 rows=1 loops=999995)
-> Index Only Scan using panda_pkey on panda panda_1 (cost=0.42..2.64 rows=1 width=16) (actual time=0.011..0.012 rows=1 loops=1)
Index Cond: (id = 'ffff581b-ef0c-4cb8-9ece-25504e420348'::uuid)
Heap Fetches: 0
Planning Time: 0.166 ms
Execution Time: 919.332 ms
Previous forward query with replaced "order_cmp"."Panda_id_0" to actual value
explain analyze
select * from
"public"."panda",
(
select
"public"."panda"."id" as "Panda_id_0"
from
"public"."panda"
where
("public"."panda"."id") = ('ffff581b-ef0c-4cb8-9ece-25504e420348')) as "order_cmp"
where
"public"."panda"."id" >= 'ffff581b-ef0c-4cb8-9ece-25504e420348'
order by
"public"."panda"."id" asc
limit 4 offset 1;
Limit (cost=1.73..5.26 rows=4 width=86) (actual time=0.032..0.040 rows=4 loops=1)
-> Nested Loop (cost=0.85..4412.29 rows=5001 width=86) (actual time=0.013..0.038 rows=5 loops=1)
-> Index Scan using panda_pkey on panda (cost=0.42..4347.13 rows=5001 width=70) (actual time=0.007..0.028 rows=5 loops=1)
Index Cond: (id >= 'ffff581b-ef0c-4cb8-9ece-25504e420348'::uuid)
-> Materialize (cost=0.42..2.65 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=5)
-> Index Only Scan using panda_pkey on panda panda_1 (cost=0.42..2.64 rows=1 width=16) (actual time=0.003..0.004 rows=1 loops=1)
Index Cond: (id = 'ffff581b-ef0c-4cb8-9ece-25504e420348'::uuid)
Heap Fetches: 0
Planning Time: 0.224 ms
Execution Time: 0.073 ms
Unrelated, just to compare:
Backward:
explain analyze
select * from
"public"."panda",
(
select
"public"."panda"."id" as "Panda_id_0"
from
"public"."panda"
where
("public"."panda"."id") = ('ffff581b-ef0c-4cb8-9ece-25504e420348')) as "order_cmp"
where
"public"."panda"."id" <= "order_cmp"."Panda_id_0"
order by
"public"."panda"."id" desc
limit 4 offset 1;
Limit (cost=0.97..1.51 rows=4 width=85) (actual time=0.025..0.030 rows=4 loops=1)
-> Nested Loop (cost=0.83..4503.05 rows=33280 width=85) (actual time=0.023..0.029 rows=5 loops=1)
Join Filter: (panda.id <= panda_1.id)
Rows Removed by Join Filter: 2
-> Index Scan Backward using panda_pkey on panda (cost=0.42..3002.81 rows=99840 width=69) (actual time=0.011..0.017 rows=7 loops=1)
-> Materialize (cost=0.42..2.64 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=7)
-> Index Only Scan using panda_pkey on panda panda_1 (cost=0.42..2.64 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (id = 'ffff581b-ef0c-4cb8-9ece-25504e420348'::uuid)
Heap Fetches: 0
Planning Time: 0.099 ms
Execution Time: 0.046 ms
Offset based:
explain analyze
select * from
"public"."panda"
order by
"public"."panda"."id" asc
limit 4 offset 500000;
Limit (cost=14768.67..14768.79 rows=4 width=70) (actual time=316.159..316.161 rows=4 loops=1)
-> Index Scan using panda_pkey on panda (cost=0.42..29536.92 rows=1000000 width=70) (actual time=0.012..298.960 rows=500004 loops=1)
Planning Time: 0.051 ms
Execution Time: 316.179 ms
I have one complexe query generated by Hibernate for JBPM. I can't really modify it and i'm searching to optimize it as much as possible.
I found out that ORDER BY DESC is way slower than ORDER BY ASC, do you have any idea ?
PostgreSQL Version : 9.4
Schema : https://pastebin.com/qNZhrbef
Query :
select
taskinstan0_.ID_ as ID1_27_,
taskinstan0_.VERSION_ as VERSION3_27_,
taskinstan0_.NAME_ as NAME4_27_,
taskinstan0_.DESCRIPTION_ as DESCRIPT5_27_,
taskinstan0_.ACTORID_ as ACTORID6_27_,
taskinstan0_.CREATE_ as CREATE7_27_,
taskinstan0_.START_ as START8_27_,
taskinstan0_.END_ as END9_27_,
taskinstan0_.DUEDATE_ as DUEDATE10_27_,
taskinstan0_.PRIORITY_ as PRIORITY11_27_,
taskinstan0_.ISCANCELLED_ as ISCANCE12_27_,
taskinstan0_.ISSUSPENDED_ as ISSUSPE13_27_,
taskinstan0_.ISOPEN_ as ISOPEN14_27_,
taskinstan0_.ISSIGNALLING_ as ISSIGNA15_27_,
taskinstan0_.ISBLOCKING_ as ISBLOCKING16_27_,
taskinstan0_.LOCKED as LOCKED27_,
taskinstan0_.QUEUE as QUEUE27_,
taskinstan0_.TASK_ as TASK19_27_,
taskinstan0_.TOKEN_ as TOKEN20_27_,
taskinstan0_.PROCINST_ as PROCINST21_27_,
taskinstan0_.SWIMLANINSTANCE_ as SWIMLAN22_27_,
taskinstan0_.TASKMGMTINSTANCE_ as TASKMGM23_27_
from JBPM_TASKINSTANCE taskinstan0_, JBPM_VARIABLEINSTANCE stringinst1_, JBPM_PROCESSINSTANCE processins2_, JBPM_VARIABLEINSTANCE variablein3_
where stringinst1_.CLASS_='S'
and taskinstan0_.PROCINST_=processins2_.ID_
and taskinstan0_.ID_=variablein3_.TASKINSTANCE_
and variablein3_.NAME_ = 'NIR'
and taskinstan0_.QUEUE = 'ERT_TPS'
and (processins2_.ORGAPATH_ like '/ERT%')
and taskinstan0_.ISOPEN_= 't'
and variablein3_.ID_=stringinst1_.ID_
order by stringinst1_.STRINGVALUE_ ASC limit '10';
Explain result for ASC :
Limit (cost=1.71..11652.93 rows=10 width=646) (actual time=6.588..82.407 rows=10 loops=1)
-> Nested Loop (cost=1.71..6215929.27 rows=5335 width=646) (actual time=6.587..82.402 rows=10 loops=1)
-> Nested Loop (cost=1.29..6213170.78 rows=5335 width=646) (actual time=6.578..82.363 rows=10 loops=1)
-> Nested Loop (cost=1.00..6159814.66 rows=153812 width=13) (actual time=0.537..82.130 rows=149 loops=1)
-> Index Scan Backward using totoidx10 on jbpm_variableinstance stringinst1_ (cost=0.56..558481.07 rows=11199905 width=13) (actual time=0.018..11.914 rows=40182 loops=1)
Filter: (class_ = 'S'::bpchar)
-> Index Scan using jbpm_variableinstance_pkey on jbpm_variableinstance variablein3_ (cost=0.43..0.49 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=40182)
Index Cond: (id_ = stringinst1_.id_)
Filter: ((name_)::text = 'NIR'::text)
Rows Removed by Filter: 1
-> Index Scan using jbpm_taskinstance_pkey on jbpm_taskinstance taskinstan0_ (cost=0.29..0.34 rows=1 width=641) (actual time=0.001..0.001 rows=0 loops=149)
Index Cond: (id_ = variablein3_.taskinstance_)
Filter: (isopen_ AND ((queue)::text = 'ERT_TPS'::text))
Rows Removed by Filter: 0
-> Index Only Scan using idx_procin_2 on jbpm_processinstance processins2_ (cost=0.42..0.51 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=10)
Index Cond: (id_ = taskinstan0_.procinst_)
Filter: ((orgapath_)::text ~~ '/ERT%'::text)
Heap Fetches: 0
Planning time: 2.598 ms
Execution time: 82.513 ms
Explain result for DESC :
Limit (cost=1.71..11652.93 rows=10 width=646) (actual time=8144.871..8144.986 rows=10 loops=1)
-> Nested Loop (cost=1.71..6215929.27 rows=5335 width=646) (actual time=8144.870..8144.984 rows=10 loops=1)
-> Nested Loop (cost=1.29..6213170.78 rows=5335 width=646) (actual time=8144.858..8144.951 rows=10 loops=1)
-> Nested Loop (cost=1.00..6159814.66 rows=153812 width=13) (actual time=8144.838..8144.910 rows=20 loops=1)
-> Index Scan using totoidx10 on jbpm_variableinstance stringinst1_ (cost=0.56..558481.07 rows=11199905 width=13) (actual time=0.066..2351.727 rows=2619671 loops=1)
Filter: (class_ = 'S'::bpchar)
Rows Removed by Filter: 906237
-> Index Scan using jbpm_variableinstance_pkey on jbpm_variableinstance variablein3_ (cost=0.43..0.49 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=2619671)
Index Cond: (id_ = stringinst1_.id_)
Filter: ((name_)::text = 'NIR'::text)
Rows Removed by Filter: 1
-> Index Scan using jbpm_taskinstance_pkey on jbpm_taskinstance taskinstan0_ (cost=0.29..0.34 rows=1 width=641) (actual time=0.002..0.002 rows=0 loops=20)
Index Cond: (id_ = variablein3_.taskinstance_)
Filter: (isopen_ AND ((queue)::text = 'ERT_TPS'::text))
-> Index Only Scan using idx_procin_2 on jbpm_processinstance processins2_ (cost=0.42..0.51 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=10)
Index Cond: (id_ = taskinstan0_.procinst_)
Filter: ((orgapath_)::text ~~ '/ERT%'::text)
Heap Fetches: 0
Planning time: 2.080 ms
Execution time: 8145.053 ms
Tables infos :
jbpm_variableinstance 12100592 rows
jbpm_taskinstance 69913 rows
jbpm_processinstance 97546 rows
If you have any idea, thanks
This typically only happens when OFFSET and / or LIMIT are involved (as is the case here).
The key difference is this line in the EXPLAIN output for the query with DESC:
Rows Removed by Filter: 906237
Meaning that while the first 10 rows in the index totoidx10 match when scanning backwards (which matches your ASC ordering, obviously), Postgres has to filter ~ 900k rows before it finally finds qualifying rows when scanning the same index forward.
A matching multicolumn index (with the right sort order) might help a lot.
Or, since Postgres chooses an unfavorable query plan, maybe just updated (or more detailed) table statistics or cost settings.
Related:
Keep PostgreSQL from sometimes choosing a bad query plan
Optimizing queries on a range of timestamps (two columns)
I am using join on the same table to get 50 rows at a time, but it takes 20 secs to get back 50 rows.
Select Distinct ON (S1.service) S1.account,S1.stid,S1.receiver,S1.identifier,
S1.binfo,S2.dlr_mask,S2.time,S1.msgdata,S1.msg_cost
from sql_sent_sms S1
left join sql_sent_sms S2
on S1.service = S2.service
where S2.time Between :senttime and :endtime
and S1.userid=:userid
Order By S1.service");
I have created index on time , service ,userid parameter.
But it didnt helped much.
what else should i do to speed up .
This is the result of analysing the query
EXPLAIN ANALYZE Select Distinct ON (S1.service) S1.account,S1.stid,S1.receiver,S1.identifier,S1.binfo,S2.dlr_mask,S2.time,S1.msgdata,S1.msg_cost
from sql_sent_sms S1
left join sql_sent_sms S2
on S1.service = S2.service
where S2.time Between '1459759193' and '1459849193'
and S1.userid='10412144'
Order By S1.service , S2.dlr_mask asc limit 50;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=56635220.85..56938614.70 rows=50 width=308) (actual time=1.177..1.177 rows=0 loops=1)
-> Unique (cost=56635220.85..57035700.74 rows=66 width=308) (actual time=1.176..1.176 rows=0 loops=1)
-> Sort (cost=56635220.85..56835460.79 rows=80095977 width=308) (actual time=1.176..1.176 rows=0 loops=1)
Sort Key: s1.service, s2.dlr_mask
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.24..127912.78 rows=80095977 width=308) (actual time=1.166..1.166 rows=0 loops=1)
-> Index Scan using idx_sentmsgbydate on sql_sent_sms s1 (cost=0.43..21247.62 rows=5765 width=292) (actual time=0.035..0.091 rows=34 loops=1)
Index Cond: ((userid)::text = '10412144'::text)
-> Index Only Scan using idx_sentmsgandstauts on sql_sent_sms s2 (cost=0.81..18.45 rows=5 width=126) (actual time=0.030..0.030 rows=0 loops=34)
Index Cond: ((service = (s1.service)::text) AND ("time" >= 1459759193::bigint) AND ("time" <= 1459849193::bigint))
Heap Fetches: 0
Planning time: 2.471 ms
Execution time: 1.269 ms
(13 rows)
I am using postgres 9.1 and I have a table with about 3.5M rows of eventtype (varchar) and eventtime (timestamp) - and some other fields. There are only about 20 different eventtype's and the event time spans about 4 years.
I want to get the last timestamp of each event type. If I run a query like:
select eventtype, max(eventtime)
from allevents
group by eventtype
it takes around 20 seconds. Selecting distinct eventtype's is equally slow. The query plan shows a full sequential scan of the table - not surprising it is slow.
Explain analyse for the above query gives:
HashAggregate (cost=84591.47..84591.68 rows=21 width=21) (actual time=20918.131..20918.141 rows=21 loops=1)
-> Seq Scan on allevents (cost=0.00..66117.98 rows=3694698 width=21) (actual time=0.021..4831.793 rows=3694392 loops=1)
Total runtime: 20918.204 ms
If I add a where clause to select a specific eventtype, it takes anywhere from 40ms to 150ms which is at least decent.
Query plan when selecting specific eventtype:
GroupAggregate (cost=343.87..24942.71 rows=1 width=21) (actual time=98.397..98.397 rows=1 loops=1)
-> Bitmap Heap Scan on allevents (cost=343.87..24871.07 rows=14325 width=21) (actual time=6.820..89.610 rows=19736 loops=1)
Recheck Cond: ((eventtype)::text = 'TEST_EVENT'::text)
-> Bitmap Index Scan on allevents_idx2 (cost=0.00..340.28 rows=14325 width=0) (actual time=6.121..6.121 rows=19736 loops=1)
Index Cond: ((eventtype)::text = 'TEST_EVENT'::text)
Total runtime: 98.482 ms
Primary key is (eventtype, eventtime). I also have the following indexes:
allevents_idx (event time desc, eventtype)
allevents_idx2 (eventtype).
How can I speed up the query?
Results of query play for correlated subquery suggested by #denis below with 14 manually entered values gives:
Function Scan on unnest val (cost=0.00..185.40 rows=100 width=32) (actual time=0.121..8983.134 rows=14 loops=1)
SubPlan 2
-> Result (cost=1.83..1.84 rows=1 width=0) (actual time=641.644..641.645 rows=1 loops=14)
InitPlan 1 (returns $1)
-> Limit (cost=0.00..1.83 rows=1 width=8) (actual time=641.640..641.641 rows=1 loops=14)
-> Index Scan using allevents_idx on allevents (cost=0.00..322672.36 rows=175938 width=8) (actual time=641.638..641.638 rows=1 loops=14)
Index Cond: ((eventtime IS NOT NULL) AND ((eventtype)::text = val.val))
Total runtime: 8983.203 ms
Using the recursive query suggested by #jjanes, the query runs between 4 and 5 seconds with the following plan:
CTE Scan on t (cost=260.32..448.63 rows=101 width=32) (actual time=0.146..4325.598 rows=22 loops=1)
CTE t
-> Recursive Union (cost=2.52..260.32 rows=101 width=32) (actual time=0.075..1.449 rows=22 loops=1)
-> Result (cost=2.52..2.53 rows=1 width=0) (actual time=0.074..0.074 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Limit (cost=0.00..2.52 rows=1 width=13) (actual time=0.070..0.071 rows=1 loops=1)
-> Index Scan using allevents_idx2 on allevents (cost=0.00..9315751.37 rows=3696851 width=13) (actual time=0.070..0.070 rows=1 loops=1)
Index Cond: ((eventtype)::text IS NOT NULL)
-> WorkTable Scan on t (cost=0.00..25.58 rows=10 width=32) (actual time=0.059..0.060 rows=1 loops=22)
Filter: (eventtype IS NOT NULL)
SubPlan 3
-> Result (cost=2.53..2.54 rows=1 width=0) (actual time=0.059..0.059 rows=1 loops=21)
InitPlan 2 (returns $3)
-> Limit (cost=0.00..2.53 rows=1 width=13) (actual time=0.057..0.057 rows=1 loops=21)
-> Index Scan using allevents_idx2 on allevents (cost=0.00..3114852.66 rows=1232284 width=13) (actual time=0.055..0.055 rows=1 loops=21)
Index Cond: (((eventtype)::text IS NOT NULL) AND ((eventtype)::text > t.eventtype))
SubPlan 6
-> Result (cost=1.83..1.84 rows=1 width=0) (actual time=196.549..196.549 rows=1 loops=22)
InitPlan 5 (returns $6)
-> Limit (cost=0.00..1.83 rows=1 width=8) (actual time=196.546..196.546 rows=1 loops=22)
-> Index Scan using allevents_idx on allevents (cost=0.00..322946.21 rows=176041 width=8) (actual time=196.544..196.544 rows=1 loops=22)
Index Cond: ((eventtime IS NOT NULL) AND ((eventtype)::text = t.eventtype))
Total runtime: 4325.694 ms
What you need is a "skip scan" or "loose index scan". PostgreSQL's planner does not yet implement those automatically, but you can trick it into using one by using a recursive query.
WITH RECURSIVE t AS (
SELECT min(eventtype) AS eventtype FROM allevents
UNION ALL
SELECT (SELECT min(eventtype) as eventtype FROM allevents WHERE eventtype > t.eventtype)
FROM t where t.eventtype is not null
)
select eventtype, (select max(eventtime) from allevents where eventtype=t.eventtype) from t;
There may be a way to collapse the max(eventtime) into the recursive query rather than doing it outside that query, but if so I have not hit upon it.
This needs an index on (eventtype, eventtime) in order to be efficient. You can have it be DESC on the eventtime, but that is not necessary. This is efficiently only if eventtype has only a few distinct values (21 of them, in your case).
Based on the question you already have the relevant index.
If upgrading to Postgres 9.3 or an index on (eventtype, eventtime desc) doesn't make a difference, this is a case where rewriting the query so it uses a correlated subquery works very well if you can enumerate all of the event types manually:
select val as eventtype,
(select max(eventtime)
from allevents
where allevents.eventtype = val
) as eventtime
from unnest('{type1,type2,…}'::text[]) as val;
Here's the plans I get when running similar queries:
denis=# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.1 on x86_64-apple-darwin11.4.2, compiled by Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn), 64-bit
(1 row)
Test data:
denis=# create table test (evttype int, evttime timestamp, primary key (evttype, evttime));
CREATE TABLE
denis=# insert into test (evttype, evttime) select i, now() + (i % 3) * interval '1 min' - j * interval '1 sec' from generate_series(1,10) i, generate_series(1,10000) j;
INSERT 0 100000
denis=# create index on test (evttime, evttype);
CREATE INDEX
denis=# vacuum analyze test;
VACUUM
First query:
denis=# explain analyze select evttype, max(evttime) from test group by evttype; QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2041.00..2041.10 rows=10 width=12) (actual time=54.983..54.987 rows=10 loops=1)
-> Seq Scan on test (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.009..15.954 rows=100000 loops=1)
Total runtime: 55.045 ms
(3 rows)
Second query:
denis=# explain analyze select val as evttype, (select max(evttime) from test where test.evttype = val) as evttime from unnest('{1,2,3,4,5,6,7,8,9,10}'::int[]) val;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Function Scan on unnest val (cost=0.00..48.39 rows=100 width=4) (actual time=0.086..0.292 rows=10 loops=1)
SubPlan 2
-> Result (cost=0.46..0.47 rows=1 width=0) (actual time=0.024..0.024 rows=1 loops=10)
InitPlan 1 (returns $1)
-> Limit (cost=0.42..0.46 rows=1 width=8) (actual time=0.021..0.021 rows=1 loops=10)
-> Index Only Scan Backward using test_pkey on test (cost=0.42..464.42 rows=10000 width=8) (actual time=0.019..0.019 rows=1 loops=10)
Index Cond: ((evttype = val.val) AND (evttime IS NOT NULL))
Heap Fetches: 0
Total runtime: 0.370 ms
(9 rows)
index on (eventtype, eventtime desc) should help. or reindex on primary key index. I would also recommend replace type of eventtype to enum (if number of types is fixed) or int/smallint. This will decrease size of data and indexes so queries will run faster.
I have a very small table "events" with just 10,703 records.
The following query takes about 600 ms:
SELECT count(id)
FROM events
WHERE event_date > now()
AND earth_distance((select position from zips where zip='94121'), ll_to_earth(venue_lat, venue_lon))<16090;
I tried to set gis index like this
CREATE INDEX latlon_idx on events USING gist(ll_to_earth(venue_lat, venue_lon));
but it didn't change anything. I also have index on event_date.
Here's explain analyze:
Aggregate (cost=5400.48..5400.49 rows=1 width=8) (actual time=615.479..615.479 rows=1 loops=1) InitPlan 1 (returns $0)
-> Index Scan using zips_zip_idx on zips (cost=0.00..8.27 rows=1 width=56) (actual time=0.051..0.056 rows=1 loops=1)
Index Cond: ((zip)::text = '94121'::text) -> Bitmap Heap Scan on events (cost=144.41..5386.03 rows=2468 width=8) (actual time=16.065..599.613 rows=3347 loops=1)
Recheck Cond: (event_date > now())
Filter: (sec_to_gc(cube_distance(($0)::cube, (ll_to_earth((venue_lat)::double precision, (venue_lon)::double precision))::cube)) < 16090::double precision)
-> Bitmap Index Scan on events_date_idx (cost=0.00..143.79 rows=7405 width=0) (actual time=13.523..13.523 rows=7614 loops=1)
Index Cond: (event_date > now()) Total runtime: 615.663 ms (10 rows)
What else I can try to speed it up?