postgres NOW() function taking too long vs string equivalent - postgresql

this is my first question on StackOverflow so forgive if the question may not be properly structured.
I have a table t_table with datetime column d_datetime and I need to filter data between the past 5 days. Both of the following work locally where I have less data:
query 1.
SELECT * FROM t_table
WHERE d_datetime
BETWEEN '2020-08-28T00:00:00.024Z' AND '2020-09-02T00:00:00.024Z';
query 2.
SELECT * FROM t_table
WHERE d_datetime
BETWEEN (NOW() - INTERVAL '5 days') AND NOW();
query 3.
SELECT * FROM t_table
WHERE d_datetime > NOW() - INTERVAL '5 days';
However, when I move to the live database, only the first query runs to completion in about 10 seconds. I cannot tell why but the other two just stick consuming too much processing power and I haven't once seen them run to completion, even after waiting upto 5 minutes on end.
I have tried automatically generating the strings used for the d_datetime shown in the first query using:
query 4.
SELECT * FROM t_table
WHERE d_datetime
BETWEEN
(TO_CHAR(NOW() - INTERVAL '5 days', 'YYYY-MM-ddThh:MI:SS.024Z'))
AND
(TO_CHAR(NOW(), 'YYYY-MM-ddThh:MI:SS.024Z'))
but it throws the following error:
operator does not exist: timestamp without time zone >= text
My questions are:
Is there any particular reason why query 1 so fast and the rest take an extremely longer period of time to run on a large dataset?
Why does query 4 fail when it practically generates the same string format as query 1 ('YYYY-MM-ddThh:mm:ss.024Z')?
The following is the result of the explain result to the first query
EXPLAIN SELECT * FROM t_table
WHERE d_datetime
BETWEEN '2020-08-28T00:00:00.024Z' AND '2020-09-02T00:00:00.024Z';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=31346.37..31788.13 rows=35341 width=22) (actual time=388.622..388.845 rows=6 loops=1)
Output: count(_hyper_12_67688_chunk.octets), _hyper_12_67688_chunk.application, (date_trunc('day'::text, _hyper_12_67688_chunk.entry_time))
Group Key: (date_trunc('day'::text, _hyper_12_67688_chunk.entry_time)), _hyper_12_67688_chunk.application
Buffers: shared hit=17193
-> Gather (cost=27105.45..31081.31 rows=35341 width=22) (actual time=377.109..398.285 rows=11 loops=1)
Output: _hyper_12_67688_chunk.application, (date_trunc('day'::text, _hyper_12_67688_chunk.entry_time)), (PARTIAL count(_hyper_12_67688_chunk.octets))
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=17193
-> Partial HashAggregate (cost=26105.45..26547.21 rows=35341 width=22) (actual time=174.272..174.535 rows=6 loops=2)
Output: _hyper_12_67688_chunk.application, (date_trunc('day'::text, _hyper_12_67688_chunk.entry_time)), PARTIAL count(_hyper_12_67688_chunk.octets)
Group Key: date_trunc('day'::text, _hyper_12_67688_chunk.entry_time), _hyper_12_67688_chunk.application
Buffers: shared hit=17193
Worker 0: actual time=27.942..28.206 rows=5 loops=1
Buffers: shared hit=579
-> Result (cost=1.73..25272.75 rows=111027 width=18) (actual time=0.805..141.094 rows=94662 loops=2)
Output: _hyper_12_67688_chunk.application, date_trunc('day'::text, _hyper_12_67688_chunk.entry_time), _hyper_12_67688_chunk.octets
Buffers: shared hit=17193
Worker 0: actual time=1.576..23.928 rows=6667 loops=1
Buffers: shared hit=579
-> Parallel Append (cost=1.73..23884.91 rows=111027 width=18) (actual time=0.800..114.488 rows=94662 loops=2)
Buffers: shared hit=17193
Worker 0: actual time=1.572..20.204 rows=6667 loops=1
Buffers: shared hit=579
-> Parallel Bitmap Heap Scan on _timescaledb_internal._hyper_12_67688_chunk (cost=1.73..11.23 rows=8 width=17) (actual time=1.570..1.618 rows=16 loops=1)
Output: _hyper_12_67688_chunk.octets, _hyper_12_67688_chunk.application, _hyper_12_67688_chunk.entry_time
Recheck Cond: ((_hyper_12_67688_chunk.entry_time >= '2020-08-28 05:45:03.024'::timestamp without time zone) AND (_hyper_12_67688_chunk.entry_time <= '2020-09-02 11:45:03.024'::timestamp without time zone))
Filter: ((_hyper_12_67688_chunk.application)::text = 'dns'::text)
Rows Removed by Filter: 32
Buffers: shared hit=11
Worker 0: actual time=1.570..1.618 rows=16 loops=1
Buffers: shared hit=11
-> Bitmap Index Scan on _hyper_12_67688_chunk_dpi_applications_entry_time_idx (cost=0.00..1.73 rows=48 width=0) (actual time=1.538..1.538 rows=48 loops=1)
Index Cond: ((_hyper_12_67688_chunk.entry_time >= '2020-08-28 05:45:03.024'::timestamp without time zone) AND (_hyper_12_67688_chunk.entry_time <= '2020-09-02 11:45:03.024'::timestamp without time zone))
Buffers: shared hit=2
Worker 0: actual time=1.538..1.538 rows=48 loops=1
Buffers: shared hit=2
-> Parallel Index Scan Backward using _hyper_12_64752_chunk_dpi_applications_entry_time_idx on _timescaledb_internal._hyper_12_64752_chunk (cost=0.14..2.36 rows=1 width=44) (actual time=0.040..0.076 rows=52 loops=1)
Output: _hyper_12_64752_chunk.octets, _hyper_12_64752_chunk.application, _hyper_12_64752_chunk.entry_time
Index Cond: ((_hyper_12_64752_chunk.entry_time >= '2020-08-28 05:45:03.024'::timestamp without time zone) AND (_hyper_12_64752_chunk.entry_time <= '2020-09-02 11:45:03.024'::timestamp without time zone))
Filter: ((_hyper_12_64752_chunk.application)::text = 'dns'::text)
Rows Removed by Filter: 52
Buffers: shared hit=
-- cut logs
-> Parallel Seq Scan on _timescaledb_internal._hyper_12_64814_chunk (cost=0.00..2.56 rows=14 width=17) (actual time=0.017..0.038 rows=32 loops=1)
Output: _hyper_12_64814_chunk.octets, _hyper_12_64814_chunk.application, _hyper_12_64814_chunk.entry_time
Filter: ((_hyper_12_64814_chunk.entry_time >= '2020-08-28 05:45:03.024'::timestamp without time zone) AND (_hyper_12_64814_chunk.entry_time <= '2020-09-02 11:45:03.024'::timestamp without time zone) AND ((_hyper_12_64814_chunk.application)::text = 'dns'::text))
Rows Removed by Filter: 40
Buffers: shared hit=2
-> Parallel Seq Scan on _timescaledb_internal._hyper_12_62262_chunk (cost=0.00..2.54 rows=9 width=19) (actual time=0.027..0.039 rows=15 loops=1)
Output: _hyper_12_62262_chunk.octets, _hyper_12_62262_chunk.application, _hyper_12_62262_chunk.entry_time
Filter: ((_hyper_12_62262_chunk.entry_time >= '2020-08-28 05:45:03.024'::timestamp without time zone) AND (_hyper_12_62262_chunk.entry_time <= '2020-09-02 11:45:03.024'::timestamp without time zone) AND ((_hyper_12_62262_chunk.application)::text = 'dns'::text))
Rows Removed by Filter: 37
Buffers: shared hit=2
Planning Time: 3367.445 ms
Execution Time: 417.245 ms
(7059 rows)
The Parallel Index Scan Backward using... log continues for all hypertable chunks in the table.
For the other three queries that have previously been mentioned to be unsuccessful, they are still not completing when queried and just end up eventually filling the memory. Thus I cannot post the EXPLAIN results of these queries, sorry.
Please let me know if my question has not been properly structured. Thanks.

You are using a partitioned table that likely has a lot of partitions, because the planning time for the query takes 3 seconds.
You are probably using PostgreSQL v11 or earlier. v12 introduced partition pruning at execution time, while v11 can only exclude partitions at query planning time.
In your first query the WHERE condition contains constants, so that works. In the other queries, the function now() is used, whose result value is only known at query execution time (it is STABLE, not IMMUTABLE), so partition pruning cannot take place at query planning time. Query planning and execution need not happen at the same time – think of prepared statements.

Related

Long waiting for SELECT query on a 70 million records table. How to improve performance?

I have a table in Postgres with more than 70 million records that relates temperature with a certain time (day) and space (meteorologic station). I need to do some calculations given a period of time and a set of meteorological stations, such as sum, average, quartile and normal value. I am using It is taking 30 seconds for returning. How can I improve this waiting?
This is the explain(analyze, buffers) select avg(p) as rain FROM waterbalances group by extract(month from date), extract(year from date);:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=3310337.68..3314085.15 rows=13836 width=24) (actual time=21252.008..21252.624 rows=478 loops=1)
Group Key: (date_part('month'::text, (date)::timestamp without time zone)), (date_part('year'::text, (date)::timestamp without time zone))
Buffers: shared hit=6335 read=734014
-> Gather Merge (cost=3310337.68..3313566.30 rows=27672 width=48) (actual time=21251.984..21261.693 rows=1432 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=15841 read=2195624
-> Sort (cost=3309337.66..3309372.25 rows=13836 width=48) (actual time=21130.846..21130.862 rows=477 loops=3) Sort Key: (date_part('month'::text, (date)::timestamp without time zone)), (date_part('year'::text, (date)::timestamp without time zone))
Sort Method: quicksort Memory: 92kB
Worker 0: Sort Method: quicksort Memory: 92kB
Worker 1: Sort Method: quicksort Memory: 92kB
Buffers: shared hit=15841 read=2195624
-> Partial HashAggregate (cost=3308109.29..3308386.01 rows=13836 width=48) (actual time=21130.448..21130.618 rows=477 loops=3)
Group Key: date_part('month'::text, (date)::timestamp without time zone), date_part('year'::text, (date)::timestamp without time zone)
Buffers: shared hit=15827 read=2195624
-> Parallel Seq Scan on waterbalances (cost=0.00..3009020.66 rows=39878483 width=24) (actual time=1.528..15460.388 rows=31914000 loops=3)
Buffers: shared hit=15827 read=2195624
Planning Time: 7.621 ms
Execution Time: 21262.552 ms
(20 rows)

Postgres weird query plan when the number of records less than "limit"

I have a query that is very fast for large date filter
EXPLAIN ANALYZE
SELECT "advertisings"."id",
"advertisings"."page_id",
"advertisings"."page_name",
"advertisings"."created_at",
"posts"."image_url",
"posts"."thumbnail_url",
"posts"."post_content",
"posts"."like_count"
FROM "advertisings"
INNER JOIN "posts" ON "advertisings"."post_id" = "posts"."id"
WHERE "advertisings"."created_at" >= '2020-01-01T00:00:00Z'
AND "advertisings"."created_at" < '2020-12-02T23:59:59Z'
ORDER BY "like_count" DESC LIMIT 20
And the query plan is:
Limit (cost=0.85..20.13 rows=20 width=552) (actual time=0.026..0.173 rows=20 loops=1)
-> Nested Loop (cost=0.85..951662.55 rows=987279 width=552) (actual time=0.025..0.169 rows=20 loops=1)
-> Index Scan using posts_like_count_idx on posts (cost=0.43..378991.65 rows=1053015 width=504) (actual time=0.013..0.039 rows=20 loops=1)
-> Index Scan using advertisings_post_id_index on advertisings (cost=0.43..0.53 rows=1 width=52) (actual time=0.005..0.006 rows=1 loops=20)
Index Cond: (post_id = posts.id)
Filter: ((created_at >= '2020-01-01 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
Planning Time: 0.365 ms
Execution Time: 0.199 ms
However, when I narrow the filter (change "created_at" >= '2020-11-25T00:00:00Z') which returns 9 records (which is less than the limit 20), the query is very slow
EXPLAIN ANALYZE
SELECT "advertisings"."id",
"advertisings"."page_id",
"advertisings"."page_name",
"advertisings"."created_at",
"posts"."image_url",
"posts"."thumbnail_url",
"posts"."post_content",
"posts"."like_count"
FROM "advertisings"
INNER JOIN "posts" ON "advertisings"."post_id" = "posts"."id"
WHERE "advertisings"."created_at" >= '2020-11-25T00:00:00Z'
AND "advertisings"."created_at" < '2020-12-02T23:59:59Z'
ORDER BY "like_count" DESC LIMIT 20
Query plan:
Limit (cost=1000.88..8051.73 rows=20 width=552) (actual time=218.485..4155.336 rows=9 loops=1)
-> Gather Merge (cost=1000.88..612662.09 rows=1735 width=552) (actual time=218.483..4155.328 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=0.85..611461.80 rows=723 width=552) (actual time=118.170..3786.176 rows=3 loops=3)
-> Parallel Index Scan using posts_like_count_idx on posts (cost=0.43..372849.07 rows=438756 width=504) (actual time=0.024..1542.094 rows=351005 loops=3)
-> Index Scan using advertisings_post_id_index on advertisings (cost=0.43..0.53 rows=1 width=52) (actual time=0.006..0.006 rows=0 loops=1053015)
Index Cond: (post_id = posts.id)
Filter: ((created_at >= '2020-11-25 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
Rows Removed by Filter: 1
Planning Time: 0.394 ms
Execution Time: 4155.379 ms
I spent hours googling but couldn't find the right solution. And help would be greatly appreciated.
Updated
When I continue narrowing the filter to
WHERE "advertisings"."created_at" >= '2020-11-27T00:00:00Z'
AND "advertisings"."created_at" < '2020-12-02T23:59:59Z'
which also returns the 9 records as the slow above query. However, this time, the query is really fast again.
Limit (cost=8082.99..8083.04 rows=20 width=552) (actual time=0.062..0.065 rows=9 loops=1)
-> Sort (cost=8082.99..8085.40 rows=962 width=552) (actual time=0.061..0.062 rows=9 loops=1)
Sort Key: posts.like_count DESC
Sort Method: quicksort Memory: 32kB
-> Nested Loop (cost=0.85..8057.39 rows=962 width=552) (actual time=0.019..0.047 rows=9 loops=1)
-> Index Scan using advertisings_created_at_index on advertisings (cost=0.43..501.30 rows=962 width=52) (actual time=0.008..0.012 rows=9 loops=1)
Index Cond: ((created_at >= '2020-11-27 00:00:00'::timestamp without time zone) AND (created_at < '2020-12-02 23:59:59'::timestamp without time zone))
-> Index Scan using posts_pkey on posts (cost=0.43..7.85 rows=1 width=504) (actual time=0.003..0.003 rows=1 loops=9)
Index Cond: (id = advertisings.post_id)
Planning Time: 0.540 ms
Execution Time: 0.096 ms
I have no idea what happens
PostgreSQL follows two different strategies in the first two and the last query:
If there are many matching advertisings rows, it uses a nested loop join to fetch the rows in the order of the ORDER BY clause and discards rows that don't match the condition until it has found 20.
If there are few matching advertisings rows, it fetches those few rows, then the matching rows in posts, then sorts and takes the first 20 rows.
The second execution is slow because PostgreSQL overestimates the rows in advertisings that match the condition. See how it estimates 962 instead of 9 in the third query?
The solution is to improve PostgreSQL's estimate:
if running
ANALYZE advertisings;
is enough to make the slow query fast, tell PostgreSQL to collect statistics more often:
ALTER TABLE advertisings SET (autovacuum_analyze_scale_factor = 0.05);
if that is not enough, try collecting more detailed statistics:
SET default_statistics_target = 1000;
ANALYZE advertisings;
You can experiment with values up to 10000. Once you found the value that works, persist it:
ALTER TABLE advertisings ALTER created_at SET STATISTICS 1000;

Return all requests from a specific date that are not finished postgresql

I need to make a query where it returns all requests that are not finished or canceled from the beginning of recordings to a specific date. The way I'm doing right now, take too much time and returns an error: 'User query might have needed to see row versions that must be removed'(my guess it's due of lack of RAM).
Below is the query I'm using, and here are some information:
T1 where each new entry is saved, with an ID, creation date, status(open,closed) and other keys for several tables.
T2 where each change made in each request is saved(in progress, waiting, rejected and closed), date of change and other keys for other tables.
SELECT T1.id_request,
T1.dt_created,
T1.status
FROM T1
LEFT JOIN T2
ON T1.id_request = T2.id_request
WHERE (T1.dt_created >= '2012-01-01 00:00:00' AND T1.dt_created <= '2020-05-31 23:59:59')
AND T1.id_request NOT IN (SELECT T2.di_request
FROM T2
WHERE ((T2.dt_change >= '2012-01-01 00:00:00'
AND T2.dt_change <= '2020-05-31 23:59:59')
OR T2.dt_change IS NULL)
AND T2.status IN ('Closed','Canceled','rejected'))
My thoughts were to get all that is received - T1(I can't just retrieve what is open, it will only work for today, not to a specific past date - what I want) between the beginning of the records and lets say end of May. Then use WHERE T1.ID NOT IN (T2.ID with STATUS 'closed', in the same period). But as I've said it takes forever and returns an error.
I use this same code to get what was open for a specific month(1st to 30rd) and works perfectly fine.
Maybe this approach is not the best approach, but I couldn't think of any other way(I'm not an expert with SQL). If there's not enough information to provide an answer fell free to ask.
As per request from #MikeOrganek here is the analyzer:
Nested Loop Left Join (cost=27985.55..949402.48 rows=227455 width=20) (actual time=2486.433..54832.280 rows=47726 loops=1)
Buffers: shared hit=293242 read=260670
Seq Scan on T1 (cost=27984.99..324236.82 rows=73753 width=20) (actual time=2467.499..6202.970 rows=16992 loops=1)
Filter: ((dt_created >= '2020-05-01 00:00:00-03'::timestamp with time zone) AND (dt_created <= '2020-05-31 23:59:59-03'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
Rows Removed by Filter: 6085779
Buffers: shared hit=188489 read=250098
SubPlan 1
Nested Loop (cost=7845.36..27983.13 rows=745 width=4) (actual time=129.379..1856.518 rows=168690 loops=1)
Buffers: shared hit=60760
Seq Scan on T3(cost=0.00..5.21 rows=3 width=8) (actual time=0.057..0.104 rows=3 loops=1)
Filter: ((status_request)::text = ANY ('{Closed,Canceled,rejected}'::text[]))
Rows Removed by Filter: 125
Buffers: shared hit=7
Bitmap Heap Scan on T2(cost=7845.36..9321.70 rows=427 width=8) (actual time=477.324..607.171 rows=56230 loops=3)
Recheck Cond: ((dt_change >= '2020-05-01 00:00:00-03'::timestamp with time zone) AND (dt_change <= '2020-05-31 23:59:59-03'::timestamp with time zone) AND (T2.ID_status= T3.ID_status))
Rows Removed by Index Recheck: 87203
Heap Blocks: exact=36359
Buffers: shared hit=60753
BitmapAnd (cost=7845.36..7845.36 rows=427 width=0) (actual time=473.864..473.864 rows=0 loops=3)
Buffers: shared hit=24394
Bitmap Index Scan on idx_ix_T2_dt_change (cost=0.00..941.81 rows=30775 width=0) (actual time=47.380..47.380 rows=306903 loops=3)
Index Cond: ((dt_change >= '2020-05-01 00:00:00-03'::timestamp with time zone) AND (dt_change<= '2020-05-31 23:59:59-03'::timestamp with time zone))
Buffers: shared hit=2523
Bitmap Index Scan on idx_T2_ID_status (cost=0.00..6895.49 rows=262724 width=0) (actual time=418.942..418.942 rows=2105165 loops=3)
Index Cond: (ID_status = T3.ID_status )
Buffers: shared hit=21871
Index Only Scan using idx_ix_T2_id_request on T2 (cost=0.56..8.30 rows=18 width=4) (actual time=0.369..2.859 rows=3 loops=16992)
Index Cond: (id_request = t17.id_request )
Heap Fetches: 44807
Buffers: shared hit=104753 read=10572
Planning time: 23.424 ms
Execution time: 54841.261 ms
And here is the main difference with dt_change IS NULL:
Planning time: 34.320 ms
Execution time: 230683.865 ms
Thanks
It looks like the OR T2.dt_change is NULL is very costly in that it increased overall execution time by a factor of five.
The only option I can see is changing the not in to a not exists, as below.
SELECT T1.id_request,
T1.dt_created,
T1.status
FROM T1
LEFT JOIN T2
ON T1.id_request = T2.id_request
WHERE T1.dt_created >= '2012-01-01 00:00:00'
AND T1.dt_created <= '2020-05-31 23:59:59'
AND NOT EXISTS (SELECT 1
FROM T2
WHERE id_request = T1.id_request
AND ( ( dt_change >= '2012-01-01 00:00:00'
AND dt_change <= '2020-05-31 23:59:59')
OR dt_change IS NULL)
AND status IN ('Closed','Canceled','rejected'))
But I expect that to give you only a marginal improvement. Can you please see how much this change helps?

Postgres Optimizer: Why it lies about costs? [EDIT] How to pick random_page_cost?

I've got following issue with Postgres:
Got two tables A and B:
A got 64 mln records
B got 16 mln records
A got b_id field which is indexed --> ix_A_b_id
B got datetime_field which is indexed --> ix_B_datetime
Got following query:
SELECT
A.id,
B.some_field
FROM
A
JOIN
B
ON A.b_id = B.id
WHERE
B.datetime_field BETWEEN 'from' AND 'to'
This query is fine when difference between from and to is small, in that case postgres use both indexes and i get results quite fast
When difference between dates is bigger query is slowing much, because postgres decides to use ix_B_datetime only and then Full Scan on table with 64 M records... which is simple stupid
I found point when optimizer decides that using Full Scan is faster.
For dates between
2019-03-10 17:05:00 and 2019-03-15 01:00:00
it got similar cost like for
2019-03-10 17:00:00 and 2019-03-15 01:00:00.
But fetching time for first query is something about 50 ms and for second almost 2 minutes.
Plans are below
Nested Loop (cost=1.00..3484455.17 rows=113057 width=8)
-> Index Scan using ix_B_datetime on B (cost=0.44..80197.62 rows=28561 width=12)
Index Cond: ((datetime_field >= '2019-03-10 17:05:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone))
-> Index Scan using ix_A_b_id on A (cost=0.56..112.18 rows=701 width=12)
Index Cond: (b_id = B.id)
Hash Join (cost=80615.72..3450771.89 rows=113148 width=8)
Hash Cond: (A.b_id = B.id)
-> Seq Scan on spot (cost=0.00..3119079.50 rows=66652050 width=12)
-> Hash (cost=80258.42..80258.42 rows=28584 width=12)
-> Index Scan using ix_B_datetime on B (cost=0.44..80258.42 rows=28584 width=12)
Index Cond: ((datetime_field >= '2019-03-10 17:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone))
So my question is why my Postgres lies about costs? Why it calculates something more expensive as it is actually? How to fix that?
Temporary I had to rewrite query to always use index on table A but I do not like following solution, because it's hacky, not clear and slower for small chunks of data but much faster for bigger chunks
with cc as (
select id, some_field from B WHERE B.datetime_field >= '2019-03-08'
AND B.datetime_field < '2019-03-15'
)
SELECT X.id, Y.some_field
FROM (SELECT b_id, id from A where b_id in (SELECT id from cc)) X
JOIN (SELECT id, some_field FROM cc) Y ON X.b_id = Y.id
EDIT:
So as #a_horse_with_no_name suggested I've played with RANDOM_PAGE_COST
I've modified query to count number of entries because fetching all was unnecessary so query looks following
SELECT count(*) FROM (
SELECT
A.id,
B.some_field
FROM
A
JOIN
B
ON A.b_id = B.id
WHERE
B.datetime_field BETWEEN '2019-03-01 00:00:00' AND '2019-03-15 01:00:00'
) A
And I've tested different levels of cost
RANDOM_PAGE_COST=0.25
Aggregate (cost=3491773.34..3491773.35 rows=1 width=8) (actual time=4166.998..4166.999 rows=1 loops=1)
Buffers: shared hit=1939402
-> Nested Loop (cost=1.00..3490398.51 rows=549932 width=0) (actual time=0.041..3620.975 rows=2462836 loops=1)
Buffers: shared hit=1939402
-> Index Scan using ix_B_datetime_field on B (cost=0.44..24902.79 rows=138927 width=8) (actual time=0.013..364.018 rows=313399 loops=1)
Index Cond: ((datetime_field >= '2019-03-01 00:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone))
Buffers: shared hit=311461
-> Index Only Scan using A_b_id_index on A (cost=0.56..17.93 rows=701 width=8) (actual time=0.004..0.007 rows=8 loops=313399)
Index Cond: (b_id = B.id)
Heap Fetches: 2462836
Buffers: shared hit=1627941
Planning time: 0.316 ms
Execution time: 4167.040 ms
RANDOM_PAGE_COST=1
Aggregate (cost=3918191.39..3918191.40 rows=1 width=8) (actual time=281236.100..281236.101 rows=1 loops=1)
" Buffers: shared hit=7531789 read=2567818, temp read=693 written=693"
-> Merge Join (cost=102182.07..3916816.56 rows=549932 width=0) (actual time=243755.551..280666.992 rows=2462836 loops=1)
Merge Cond: (A.b_id = B.id)
" Buffers: shared hit=7531789 read=2567818, temp read=693 written=693"
-> Index Only Scan using A_b_id_index on A (cost=0.56..3685479.55 rows=66652050 width=8) (actual time=0.010..263635.124 rows=64700055 loops=1)
Heap Fetches: 64700055
Buffers: shared hit=7220328 read=2567818
-> Materialize (cost=101543.05..102237.68 rows=138927 width=8) (actual time=523.618..1287.145 rows=2503965 loops=1)
" Buffers: shared hit=311461, temp read=693 written=693"
-> Sort (cost=101543.05..101890.36 rows=138927 width=8) (actual time=523.616..674.736 rows=313399 loops=1)
Sort Key: B.id
Sort Method: external merge Disk: 5504kB
" Buffers: shared hit=311461, temp read=693 written=693"
-> Index Scan using ix_B_datetime_field on B (cost=0.44..88589.92 rows=138927 width=8) (actual time=0.013..322.016 rows=313399 loops=1)
Index Cond: ((datetime_field >= '2019-03-01 00:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone))
Buffers: shared hit=311461
Planning time: 0.314 ms
Execution time: 281237.202 ms
RANDOM_PAGE_COST=2
Aggregate (cost=4072947.53..4072947.54 rows=1 width=8) (actual time=166896.775..166896.776 rows=1 loops=1)
" Buffers: shared hit=696849 read=2067171, temp read=194524 written=194516"
-> Hash Join (cost=175785.69..4071572.70 rows=549932 width=0) (actual time=29321.835..166332.812 rows=2462836 loops=1)
Hash Cond: (A.B_id = B.id)
" Buffers: shared hit=696849 read=2067171, temp read=194524 written=194516"
-> Seq Scan on A (cost=0.00..3119079.50 rows=66652050 width=8) (actual time=0.008..108959.789 rows=64700055 loops=1)
Buffers: shared hit=437580 read=2014979
-> Hash (cost=173506.11..173506.11 rows=138927 width=8) (actual time=29321.416..29321.416 rows=313399 loops=1)
Buckets: 131072 (originally 131072) Batches: 8 (originally 2) Memory Usage: 4084kB
" Buffers: shared hit=259269 read=52192, temp written=803"
-> Index Scan using ix_B_datetime_field on B (cost=0.44..173506.11 rows=138927 width=8) (actual time=1.676..29158.413 rows=313399 loops=1)
Index Cond: ((datetime_field >= '2019-03-01 00:00:00'::timestamp without time zone) AND (datetime_field < '2019-03-15 01:00:00'::timestamp without time zone))
Buffers: shared hit=259269 read=52192
Planning time: 7.367 ms
Execution time: 166896.824 ms
Still it's unclear for me, cost 0.25 is best for me but everywhere I can read that for ssd disk it should be 1-1.5. (I'm using AWS instance with ssd)
What is weird at cost 1 plan is worse than at 2 and 0.25
So what value to pick? Is there any possibility to calculate it?
Costs 0.25 > 2 > 1 efficiency in that case, what about other cases? How can I be sure that 0.25 which is good for my query won't break other queries. Do I need to write performance tests for every query I got?

postgres how to debug why planning time is too long?

version - postgres 9.6.
I were not so clear in question i asked in past and someone already answer there, so i thought best will be to post new question with more clear info and be more specific about my question.
Trying to join event table with dimension table.
event table is a daily partition (3k children) table with check constraints.The event table has 72 columns (i suspect that this is the issue).
I simplify the query in order to demonstrate the question (in practice range is wider and i query field from both tables).
You can see that for this simple query - the plan take almost 10 seconds (my question is about plan time and not execution time).
If i query direct on the child table ( please dont advice to use union on all child in range ) query plan is few ms.
explain analyze select campaign_id , spent as spent from events_daily r left join report_campaigns c on r.campaign_id = c.c_id where date >= '20170720' and date < '20170721' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.29..28.88 rows=2 width=26) (actual time=0.021..0.021 rows=0 loops=1)
-> Append (cost=0.00..12.25 rows=2 width=26) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on events_daily r (cost=0.00..0.00 rows=1 width=26) (actual time=0.002..0.002 rows=0 loops=1)
Filter: ((date >= '2017-07-20 00:00:00'::timestamp without time zone) AND (date < '2017-07-21 00:00:00'::timestamp without time zone))
-> Seq Scan on events_daily_20170720 r_1 (cost=0.00..12.25 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=1)
Filter: ((date >= '2017-07-20 00:00:00'::timestamp without time zone) AND (date < '2017-07-21 00:00:00'::timestamp without time zone))
-> Index Only Scan using report_campaigns_campaign_idx on report_campaigns c (cost=0.29..8.31 rows=1 width=8) (never executed)
Index Cond: (c_id = r.campaign_id)
Heap Fetches: 0
Planning time: 8393.337 ms
Execution time: 0.132 ms
(11 rows)
explain analyze select campaign_id , spent as spent from events_daily_20170720 r left join report_campaigns c on r.campaign_id = c.c_id where date >= '20170720' and date < '20170721' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.29..20.57 rows=1 width=26) (actual time=0.008..0.008 rows=0 loops=1)
-> Seq Scan on events_daily_20170720 r (cost=0.00..12.25 rows=1 width=26) (actual time=0.007..0.007 rows=0 loops=1)
Filter: ((date >= '2017-07-20 00:00:00'::timestamp without time zone) AND (date < '2017-07-21 00:00:00'::timestamp without time zone))
-> Index Only Scan using report_campaigns_campaign_idx on report_campaigns c (cost=0.29..8.31 rows=1 width=8) (never executed)
Index Cond: (c_id = r.campaign_id)
Heap Fetches: 0
Planning time: 0.242 ms
Execution time: 0.059 ms
\d events_daily_20170720
date | timestamp without time zone |
Check constraints:
"events_daily_20170720_date_check" CHECK (date >= '2017-07-20 00:00:00'::timestamp without time zone AND date < '2017-07-21 00:00:00'::timestamp without time zone)
Inherits: events_daily
show constraint_exclusion;
constraint_exclusion
----------------------
on
When running ltrace it seems that it run this thousands of time on each field (hint that it run on all patitions tables for the plan) :
strlen("process") = 7
memcpy(0x0b7aac10, "process", 8) = 0x0b7aac10
strlen("channel") = 7
memcpy(0x0b7aac68, "channel", 8) = 0x0b7aac68
strlen("deleted") = 7
memcpy(0x0b7aacc0, "deleted", 8) = 0x0b7aacc0
strlen("old_spent") = 9
memcpy(0x0b7aad18, "old_spent", 10)
The problem is that you have too many partitions.
As the documentation warns:
All constraints on all partitions of the master table are examined during constraint exclusion,
so large numbers of partitions are likely to increase query planning time considerably.
Partitioning using these techniques will work well with up to perhaps a hundred partitions;
don't try to use many thousands of partitions.
You should try to reduce the number of partitions by using a longer time interval for each partition.
Alternatively, you could try to change the application code to directly access the correct partition if possible, but that might prove difficult and it removes many advantages that partitioning should bring.