I have a SQL query which is using inner join on two tables and filtering data based on several params. Going by the query plan, for different values of query params (like different date range), Postgres is using different index.
I am aware of the fact that Postgres determines if the index has to be used or not, depending on the number or rows in the result set. But why does Postgres choose to use different index for same query. The query time varies by a factor of 10, between the two cases. How can I optimise the query? As Postgres does not allows the user to define the index to be used in a query.
Edit:
explain (analyze, buffers, verbose) SELECT COUNT(*) FROM "bookings" INNER JOIN "hotels" ON "hotels"."id" = "bookings"."hotel_id" WHERE "bookings"."hotel_id" = 37016 AND (bookings.status in (0,1,2,3,4,5,6,7,9,10,11,12)) AND (bookings.source in (0,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70) or bookings.status in (0,1,2,3,4,5,6,7,8,9,10,11,13)) AND (
bookings.source in (4,66,65)
OR
date(timezone('+05:30',bookings.created_at))>checkin
OR
(
( date(timezone('+05:30',bookings.created_at))=checkin
and
extract (epoch from COALESCE(cancellation_time,NOW())-bookings.created_at)>600
)
OR
( date(timezone('+05:30',bookings.created_at))<checkin
and
extract (epoch from COALESCE(cancellation_time,NOW())-bookings.created_at)>600
and
(
extract (epoch from ((bookings.checkin||' '||hotels.checkin_time)::timestamp -COALESCE(cancellation_time,bookings.checkin))) < extract(epoch from '16 hours'::interval)
OR
(DATE(bookings.checkout)-DATE(bookings.checkin))*(COALESCE(bookings.oyo_rooms,0)+COALESCE(bookings.owner_rooms,0)) > 3
)
)
)
) AND (bookings.checkin >= '2018-11-21') AND (bookings.checkin <= '2019-05-19') AND "bookings"."hotel_id" = '37016' AND "bookings"."status" IN (0, 1, 2, 3, 12);
QueryPlan : https://explain.depesz.com/s/SPeb
explain (analyze, buffers, verbose) SELECT COUNT(*) FROM "bookings" INNER JOIN "hotels" ON "hotels"."id" = 37016 WHERE "bookings"."hotel_id" = 37016 AND (bookings.status in (0,1,2,3,4,5,6,7,9,10,11,12)) AND (bookings.source in (0,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70) or bookings.status in (0,1,2,3,4,5,6,7,8,9,10,11,13)) AND (
bookings.source in (4,66,65)
OR
date(timezone('+05:30',bookings.created_at))>checkin
OR
(
( date(timezone('+05:30',bookings.created_at))=checkin
and
extract (epoch from COALESCE(cancellation_time,now())-bookings.created_at)>600
)
OR
( date(timezone('+05:30',bookings.created_at))<checkin
and
extract (epoch from COALESCE(cancellation_time,now())-bookings.created_at)>600
and
(extract (epoch from ((bookings.checkin||' '||hotels.checkin_time)::timestamp -COALESCE(cancellation_time,bookings.checkin))) < extract(epoch from '16 hours'::interval)
OR
(DATE(bookings.checkout)-DATE(bookings.checkin))*(COALESCE(bookings.oyo_rooms,0)+COALESCE(bookings.owner_rooms,0)) > 3
)
)
)
) AND (bookings.checkin >= '2018-11-22') AND (bookings.checkin <= '2019-05-19') AND "bookings"."hotel_id" = '37016' AND "bookings"."status" IN (0,1,2,3,4,12);
QueryPlan: https://explain.depesz.com/s/DWD
Finally found the solution to this problem. I am querying on the basis of more than 10 possible values of a column (status in this case). If I break this query into multiple sub-queries each querying upon only 1 status value and aggregate the result using union all, then the query plan executed uses optimized index for each subquery.
Results: The query time decreased by 10 times by this change.
Possible explanation for this behaviour, the query planner fetches less number of rows for each subquery and uses the optimized index in this case. I am not sure about if this is the correct explanation.
I want to compare two results of queries of the same table, by checking theresulting row count, but Postgres doesn't support column aliases in the where clause.
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count1=(select count(ident) from artprice where article=article.id)
)
I also cannot use aggregate functions in the where clause, so
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count(ident)=(select count(ident) from artprice where article=article.id)
)
also doesn't work. Any ideas?
EDIT:
What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I now changed the statement by negating the positive conditions:
Select distinct article.id from Article article, ArtPrice price
where
(
(article.version=?)
and
(
(
(
(
(not(price.valid_from>=?)) or (not(price.valid_to<=?))
)
and
(
(not(price.valid_from<=?)) or (not(price.valid_to>=?))
)
)
and
(
(not(price.valid_to>=?)) or (not(price.valid_to<=?))
)
)
and
(
(not(price.valid_from>=?)) or (not(price.valid_from<=?))
)
)
) and article.id=price.article
Probably not the very elegant solution, but it works.
Aggregates are not allowed in WHERE clause, but there's HAVING clause for them.
EDIT: What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I think that bool_or() would be a good fit here when combined with range operations:
SELECT article.id
FROM Article AS article
JOIN ArtPrice AS price ON price.article = article.id
WHERE article.version = 1308
GROUP BY article.id
HAVING NOT bool_or(tsrange(price.valid_from, price.valid_to)
&& tsrange(to_timestamp(1586642400000),
to_timestamp(1672441199000)))
This reads as "...those having not any price tsrange overlap with given tsrange".
Postgresql also supports the SQL OVERLAPS operator:
(price.valid_from, price.valid_to) OVERLAPS (to_timestamp(1586642400000),
to_timestamp(1672441199000))
As a note, it operates on half-open intervals start <= time < end.