Postgres High Cpu for every run until I run analyze - postgresql

I have a batch job that processes about 50k rows a day.. the query that is executed looks like the following. The tables are partitioned!
select
*
from
billing abstractbi0_
inner join sale sale1_ on abstractbi0_.sale_billing_date = sale1_.billing_date
and abstractbi0_.sale_id = sale1_.id
where
abstractbi0_.dtype in ('INVOICE_CORRECTED_BEFORE_BILLING', 'INVOICE', 'GOODWILL', 'REFUND', 'ZERO_SUM')
and sale1_.proposal_id = '47037059-d231-40d9-a0f5-242577596b5c'
and abstractbi0_.billing_date = '2022-09-27'
and sale1_.billing_date = '2022-09-27';
Every time when the batch job starts the cpu gets to 100% and the performance is very bad. When I run analyze during the high cpu phase the performance get 20x better.
the execution plan before running analyze...
Nested Loop (cost=0.85..14.67 rows=1 width=574)
Join Filter: (abstractbi0_.sale_id = sale1_.id)
-> Index Scan using billing_p2022_09_sale_billing_date_sale_id_idx on billing_p2022_09 abstractbi0_ (cost=0.43..8.45 rows=1 width=467)
Index Cond: (sale_billing_date = '2022-09-27'::date)
Filter: ((billing_date = '2022-09-27'::date) AND ((dtype)::text = ANY ('{INVOICE_CORRECTED_BEFORE_BILLING,INVOICE,GOODWILL,REFUND,ZERO_SUM}'::text[])))
-> Index Scan using sale_p2022_09_pkey on sale_p2022_09 sale1_ (cost=0.43..6.20 rows=1 width=107)
Index Cond: (billing_date = '2022-09-27'::date)
Filter: (proposal_id = '47037059-d231-40d9-a0f5-242577596b5c'::uuid)
and after analyze....
Nested Loop (cost=0.85..16.91 rows=1 width=1084)
-> Index Scan using sale_p2022_09_billing_date_proposal_id_idx on sale_p2022_09 sale1_ (cost=0.43..8.45 rows=1 width=107)
Index Cond: ((billing_date = '2022-09-27'::date) AND (proposal_id = '47037059-d231-40d9-a0f5-242577596b5c'::uuid))
-> Index Scan using billing_p2022_09_sale_billing_date_sale_id_billing_date_dty_idx on billing_p2022_09 abstractbi0_ (cost=0.43..8.46 rows=1 width=977)
Index Cond: ((sale_billing_date = '2022-09-27'::date) AND (sale_id = sale1_.id) AND (billing_date = '2022-09-27'::date))
Filter: ((dtype)::text = ANY ('{INVOICE_CORRECTED_BEFORE_BILLING,INVOICE,GOODWILL,REFUND,ZERO_SUM}'::text[]))
the second plan does not cause 100% cpu and is about 20x faster than the first one.
But the next day the bad plan is back in place and I see 100% cpu again...
I tried to run analyze periodically (every day) because of my partitioned tables. According to the documentation autovacuum does not support partitioned tables.
My tries...
just autovacuum --> does not work
periodic vacuum analyze with autovacuum --> does not work
periodic vacuum analyze without autovacuum --> does not work.
I see only one option now... to increase one of the cpu_*_cost factors to force the planner to prefer the second execution plan. But I am a bit scared of doing that, because it would affect all plans..
Any other ideas?
I use spring/jpa the query is generated...

Related

Speed up PostgreSQL COUNT request with ORDER BY and multiple INNER JOIN

I have a "complex" request, used from a back office only (2 users), that takes around 5s to perform. I would like to know if there are some tips to reduce this delay.
There are 5M records in each table.
optimized_all is a varchar and it has a BTREE index.
The ORDER BY seems to be the main cause of the delay. When I remove it, it's 80ms...
The website is on a dedicated server.
work_men is currently set to 10Mb on the postgresl.conf
The request:
SELECT
optimized_all,
COUNT(optimized_all) AS count_optimized_all
FROM
"usr_drinks"
INNER JOIN usr_seasons ON usr_seasons.drink_id = usr_drinks.id
INNER JOIN usr_photos ON usr_photos.season_id = usr_seasons.id
AND(usr_photos.verified_kind = 1
OR usr_photos.verified_kind = 0)
WHERE
(usr_drinks.optimized_type_id = 1
AND usr_drinks.optimized_status = 1
AND usr_seasons.verified_at IS NULL
)
GROUP BY
usr_drinks.optimized_all
ORDER BY
count_optimized_all DESC
LIMIT 10;
Explain Analyze:
Limit (cost=150022.12..150022.12 rows=1 width=194) (actual time=4813.137..4923.631 rows=1 loops=1)
-> Sort (cost=150022.12..150111.98 rows=35945 width=194) (actual time=4813.136..4923.629 rows=1 loops=1)
Sort Key: (count(usr_drinks.optimized_all)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> Finalize GroupAggregate (cost=144716.68..149842.39 rows=35945 width=194) (actual time=3675.407..4881.022 rows=314695 loops=1)
Group Key: usr_drinks.optimized_all
-> Gather Merge (cost=144716.68..149297.46 rows=37096 width=101) (actual time=3675.400..4799.409 rows=462144 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=143716.62..143878.91 rows=9274 width=101) (actual time=3647.837..3914.241 rows=92429 loops=5)
Group Key: usr_drinks.optimized_all
-> Sort (cost=143716.62..143739.80 rows=9274 width=93) (actual time=3647.828..3867.945 rows=161362 loops=5)
Sort Key: usr_drinks.optimized_all
Sort Method: external merge Disk: 18848kB
Worker 0: Sort Method: external merge Disk: 16016kB
Worker 1: Sort Method: external merge Disk: 16016kB
Worker 2: Sort Method: external merge Disk: 16008kB
Worker 3: Sort Method: external merge Disk: 15752kB
-> Nested Loop (cost=1.30..143105.51 rows=9274 width=93) (actual time=12.400..3077.821 rows=161362 loops=5)
-> Nested Loop (cost=0.86..104531.30 rows=48751 width=109) (actual time=1.882..1242.603 rows=172132 loops=5)
-> Parallel Index Scan using usr_drinks_on_optimized_type_idx on usr_drinks (cost=0.43..35406.66 rows=44170 width=109) (actual time=0.097..216.641 rows=196036 loops=5)
Index Cond: (optimized_type_id = 1)
Filter: (optimized_status = 1)
Rows Removed by Filter: 9387
-> Index Scan using usr_seasons_on_drink_id_idx on usr_seasons (cost=0.43..1.54 rows=2 width=32) (actual time=0.005..0.005 rows=1 loops=980181)
Index Cond: (drink_id = usr_drinks.id)
Filter: (verified_at IS NULL)
Rows Removed by Filter: 0
-> Index Scan using usr_photos_on_season_id_idx on usr_photos (cost=0.43..0.78 rows=1 width=16) (actual time=0.008..0.010 rows=1 loops=860662)
Index Cond: (season_id = usr_seasons.id)
Filter: ((verified_kind = 1) OR (verified_kind = 0))
Rows Removed by Filter: 1
Planning Time: 1.120 ms
Execution Time: 4927.502 ms
Possible solution ?:
Storing the count in another table, but for my needs, it seems quite complicate to update the counters. Any new idea is welcome.
EDIT 1: I removed the 2 unnecessary INNER JOIN. Now there are only 2.
EDIT 2: I tried to replace the last 2 INNER JOIN by a double EXIST condition. I saved only 1 second. (request is now 4 seconds instead of 1)
SELECT
optimized_all,
COUNT(optimized_all) AS count_optimized_all
FROM
"usr_drinks"
WHERE (usr_drinks.optimized_type_id = 1
AND usr_drinks.optimized_status = 1)
AND EXISTS (
SELECT
*
FROM
usr_seasons
WHERE
usr_seasons.drink_id = usr_drinks.id
AND usr_seasons.verified_at IS NULL
AND EXISTS (
SELECT
*
FROM
usr_photos
WHERE
usr_photos.season_id = usr_seasons.id
AND(usr_photos.verified_kind = 1
OR usr_photos.verified_kind = 0)))
GROUP BY
usr_drinks.optimized_all
ORDER BY
count_optimized_all DESC
LIMIT 10;
EDIT 3: the current postgresql.conf settings are:
max_connections = 100
shared_buffers = 6GB
effective_cache_size = 18GB
maintenance_work_mem = 1536MB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10485kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 12
max_parallel_workers_per_gather = 6
max_parallel_workers = 12
Increasing work_mem, even to 256MB, doesn't help (surely because my disk is a SSD) ?
You are facing a structural problem of PostGreSQL which is unable to correctly optimize queries of type COUNT or SUM. This is due to PostGreSQL's internal architecture due to the way PostGreSQL handles MVCC (Multi Versioning Concurrency Control).
Take a look at the article I wrote about it.
The only way around this problem is to use a materialized view.
As I didn't find a way to speed up massively the request, due to the ORDER BY count delay, this is what I did:
I created a new table that stores the optimized_all field with the corresponding optimized_all_count - I didn't want to do this firstly, but it was just a 3 hours work for me.
I run a task once a day that fill this table with INSERT...SELECT... and the long request (it's a rails app)
Now, I just search in this new table... it's just a few milliseconds of course.
This is completely acceptable for my needs (an admin tool), but could not correspond to other scenarios.
Thanks to everybody for your suggestions.
Try creating below partial index on usr_photos
CREATE INDEX user_photos_session_id_partial_ix
ON usr_photos (season_id)
WHERE (verified_kind = 1) OR (verified_kind = 0);
it should reduce query time by 1.5 seconds.

Postgres not using index to sort data

I have a table learners which has around 3.2 million rows. This table contains user related information like name and email. I need to optimize some queries that uses order by on some column. So for testing I have created a temp_learners table, with 0.8 million rows. I have created two indexes on this table:
CREATE UNIQUE INDEX "temp_learners_companyId_userId_idx"
ON temp_learners ("companyId" ASC, "userId" ASC, "learnerUserName" ASC, "learnerEmailId" ASC);
and
CREATE INDEX temp_learners_company_name_email_index
ON temp_learners ("companyId", "learnerUserName", "learnerEmailId");
The second index is just for testing.
Now When I run this query:
SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431 AND "userId" IN (
4990609084216745771,
4990610022492247987,
4990609742667096366,
4990609476136523663,
5451985767018841230,
5451985767078553638,
5270390122102920730,
4763688819142650938,
5056979692501246449,
5279569274741647114,
5031660827132289520,
4862889373349389098,
5299864070077160421,
4740222596778406913,
5320170488686569878,
5270367618320474818,
5320170488587895729,
5228888485293847415,
4778050469432720821,
5270392314970177842,
4849087862439244546,
5270392117430427860,
5270351184072717902,
5330263074228870897,
4763688829301614114,
4763684609695916489,
5270390232949727716
) ORDER BY "learnerUserName","learnerEmailId";
The query plan used by db is this:
Sort (cost=138.75..138.76 rows=4 width=1581) (actual time=0.169..0.171 rows=27 loops=1)
" Sort Key: ""learnerUserName"", ""learnerEmailId"""
Sort Method: quicksort Memory: 73kB
-> Index Scan using "temp_learners_companyId_userId_idx" on temp_learners (cost=0.55..138.71 rows=4 width=1581) (actual time=0.018..0.112 rows=27 loops=1)
" Index Cond: ((""companyId"" = '909666665757230431'::bigint) AND (""userId"" = ANY ('{4990609084216745771,4990610022492247987,4990609742667096366,4990609476136523663,5451985767018841230,5451985767078553638,5270390122102920730,4763688819142650938,5056979692501246449,5279569274741647114,5031660827132289520,4862889373349389098,5299864070077160421,4740222596778406913,5320170488686569878,5270367618320474818,5320170488587895729,5228888485293847415,4778050469432720821,5270392314970177842,4849087862439244546,5270392117430427860,5270351184072717902,5330263074228870897,4763688829301614114,4763684609695916489,5270390232949727716}'::bigint[])))"
Planning time: 0.116 ms
Execution time: 0.191 ms
In this it does not sort on indexs.
But when I run this query
SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431
ORDER BY "learnerUserName","learnerEmailId" limit 500;
This query uses indexs on sorting.
Limit (cost=0.42..1360.05 rows=500 width=1581) (actual time=0.018..0.477 rows=500 loops=1)
-> Index Scan using temp_learners_company_name_email_index on temp_learners (cost=0.42..332639.30 rows=122327 width=1581) (actual time=0.018..0.442 rows=500 loops=1)
Index Cond: ("companyId" = '909666665757230431'::bigint)
Planning time: 0.093 ms
Execution time: 0.513 ms
What I am not able to understand is why postgre does not uses index in first query? Also, I want to clear out that the normal use case of this table learner is to join with other tables. So the first query I written is more similar to joins equation. So for example,
SELECT *
FROM temp_learners AS l
INNER JOIN entity_learners_basic AS elb
ON l."companyId" = elb."companyId" AND l."userId" = elb."userId"
WHERE l."companyId" = 909666665757230431 AND elb."gameId" = 1050403501267716928
ORDER BY "learnerUserName", "learnerEmailId" limit 5000;
Even after correcting indexes the query plan does not indexes for sorting.
QUERY PLAN
Limit (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.554..173.135 rows=5000 loops=1)
-> Sort (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.553..172.791 rows=5000 loops=1)
" Sort Key: l.""learnerUserName"", l.""learnerEmailId"""
Sort Method: external merge Disk: 35416kB
-> Nested Loop (cost=1.12..3783.91 rows=44 width=1767) (actual time=0.019..63.743 rows=21195 loops=1)
-> Index Scan using primary_index__entity_learners_basic on entity_learners_basic elb (cost=0.57..1109.79 rows=314 width=186) (actual time=0.010..6.221 rows=21195 loops=1)
Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("gameId" = '1050403501267716928'::bigint))
-> Index Scan using "temp_learners_companyId_userId_idx" on temp_learners l (cost=0.55..8.51 rows=1 width=1581) (actual time=0.002..0.002 rows=1 loops=21195)
Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("userId" = elb."userId"))
Planning time: 0.309 ms
Execution time: 178.422 ms
Does Postgres not use indexes when joining and ordering data?
PostgreSQL chooses the plan it thinks will be faster. Using the index that provides rows in the correct order means using a much less selective index, so it doesn't think that will be faster overall.
If you want to force PostgreSQL into believing that sorting is the worst thing in the world, you could set enable_sort=off. If it still sorts after that, then you know PostgreSQL doesn't have the right indexes to avoid sorting, as opposed to just thinking they will not actually be faster.
PostgreSQL could use an index on ("companyId", "learnerUserName", "learnerEmailId") for your first query, but the additional IN condition reduces the number of result rows to an estimated 4 rows, which means that the sort won't cost anything at all. So it chooses to use the index that can support the IN condition.
Rows returned with that index won't be in the correct order automatically, because
you specified DESC for the last index column, but ASC to the preceding one
you have more than one element in the IN list.
Without the IN condition, enough rows are returned, so that PostgreSQL thinks that it is cheaper to order by the index and filter out rows that don't match the condition.
With your first query, it is impossible to have an index that supports both the IN list in the WHERE condition and the ORDER BY clause, so PostgreSQL has to make a choice.

Forcing PG to use index with timerange. Works on small resultset, doesn't work on bigger sets

Edited: added Explain Analyze
I've got the following table (simplified for example):
CREATE TABLE public.streamscombined
(
eventtype text COLLATE pg_catalog."default",
payload jsonb,
clienttime bigint, //as millis from epoch
)
And a b-tree compound index on clienttime + eventtype
Correct use of index when index prunes a lot of rows
Doing a query of the following format correctly uses the index with a clienttime that filters a lot of documents. e.g.:
explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= 1522550900000 order by clienttime;
=>
Index Scan using "clienttime/type" on streamscombined (cost=0.56..1781593.82 rows=1135725 width=583)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))
Explain Analyze
Index Scan using "clienttime/type" on streamscombined (cost=0.56..1711616.01 rows=1079021 width=592) (actual time=1.369..13069.861 rows=1074896 loops=1)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))
Planning time: 0.208 ms
Execution time: 13369.330 ms
RESULT: streaming results I see data coming in within 100ms.
Ignoring index when index prunes less rows
However, if completely falls flat when relaxing the clienttime-condition e.g (adding 3 hours):
explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= (1522540900000 + (3*3600*1000)) order by clienttime;
=>
Gather Merge (cost=2897003.10..3192254.78 rows=2530552 width=583)
Workers Planned: 2
-> Sort (cost=2896003.07..2899166.26 rows=1265276 width=583)
Sort Key: clienttime
-> Parallel Seq Scan on streamscombined (cost=0.00..2110404.89 rows=1265276 width=583)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))
Explain analyze
Gather Merge (cost=2918263.39..3193771.83 rows=2361336 width=592) (actual time=72505.138..75142.127 rows=2852704 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=2917263.37..2920215.04 rows=1180668 width=592) (actual time=70764.052..71430.200 rows=950901 loops=3)
Sort Key: clienttime
Sort Method: external merge Disk: 722336kB
-> Parallel Seq Scan on streamscombined (cost=0.00..2176719.08 rows=1180668 width=592) (actual time=0.451..57458.888 rows=950901 loops=3)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))
Rows Removed by Filter: 7736119
Planning time: 0.109 ms
Execution time: 76164.816 ms
RESULT: streaming results I've waited for > 5 minutes without any result.
This is likely because PG believes the index will not prune the resultset that much, so it will use a different strategy.
However, and this is key, it completely seems to ignore the fact that I want to order by clienttime and the index is giving me that for free.
Is there any way to force PG to use the index independent on the actual value for the clienttime-condition?
sorting result is cheap, index scan is expensive as it does many disk seeks.
a lower setting of ramdom_page_cost reduces the cost estimate for the index scan resulting in index scans being used for larger result-sets.

PostgreSQL join fetch all rows in table, too slow

i have the two tables "commissions" and "mt4_trades". In "mt4_trades" "ticket" column is private key, in "commissions" there is "order_id" and it is has relation to mt4_trades.ticket as one to many (one "ticket" to many "order_id"). And i have this statement:
SELECT commissions.ibs_account AS ibs_account
FROM "public"."mt4_trades"
INNER JOIN commissions ON commissions.order_id = mt4_trades.ticket
WHERE "mt4_trades"."close_time" >= '2014.11.01'
AND "mt4_trades"."close_time" < '2014.12.01'
commissions table constains about 4 millions rows. This statement return 480000 rows. But it is too slow: executions time 9 sec. I did EXPLAIN ANALYZE:
Hash Join (cost=43397.07..216259.97 rows=144233 width=7) (actual time=3993.839..9459.896 rows=488131 loops=1)
Hash Cond: (commissions.order_id = mt4_trades.ticket)
-> Seq Scan on commissions (cost=0.00..116452.08 rows=3997708 width=15) (actual time=0.005..4185.254 rows=3997157 loops=1)
-> Hash (cost=42485.10..42485.10 rows=72958 width=4) (actual time=288.767..288.767 rows=97260 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 3420kB
-> Index Scan using "INDEX_CLOSETIME" on mt4_trades (cost=0.43..42485.10 rows=72958 width=4) (actual time=0.020..174.810 rows=97260 loops=1)
Index Cond: ((close_time >= '2014-11-01 00:00:00'::timestamp without time zone) AND (close_time < '2014-12-01 00:00:00'::timestamp without time zone))
Total runtime: 9881.979 ms
This row:
-> Seq Scan on commissions (cost=0.00..116452.08 rows=3997708 width=15) (actual time=0.005..4185.254 rows=3997157 loops=1)
Means that scan the whole "commissions" table instead compare "order_id" and "ticket" first.
Can you help me how i can improve this query. THanks
9 seconds to return half a million rows is not terrible, and a sequential scan on 4M could be much faster than faster than 100K indexed lookups on 4M. Assuming you've got an index on order_id, you can test this by running set enable_seqscan TO false; before running your query (this will only affect the current connection).
Do you really need all 500K rows every time you run this query? Or are you going to be filtering the results? If you're almost always going to be filtering the results in some other way, you probably want to optimize against that query rather than the one that returns all 500K rows.

Postgresql Update command is too slow

I have the following postgresql table:
CREATE TABLE "initialTable" (
"paramIDFKey" integer,
"featAIDFKey" integer,
"featBIDFKey" integer,
"featAPresent" boolean,
"featBPresent" boolean,
"dataID" text
);
I update this table by following command:
UPDATE "initalTable"
SET "dataID" = "dataID" || '#' || 'NEWDATA'
where
"paramIDFKey" = parameterID
and "featAIDFKey" = featAIDFKey
and "featBIDFKey" = featBIDFKey
and "featAPresent" = featAPresent
and "featBPresent" = featBPresent
As you can see, I am updating the dataID for each row. This update works as an append. It appends new data to the previous ones.
This is too slow. Specially when the "dataID" column gets larger.
Following is the "Explain" results:
"Bitmap Heap Scan on "initialTable" (cost=4.27..8.29 rows=1 width=974)"
" Recheck Cond: (("paramIDFKey" = 53) AND ("featAIDFKey" = 0) AND ("featBIDFKey" = 95))"
" Filter: ("featAPresent" AND (NOT "featBPresent"))"
" -> Bitmap Index Scan on "InexactIndex" (cost=0.00..4.27 rows=1 width=0)"
" Index Cond: (("paramIDFKey" = 53) AND ("featAIDFKey" = 0) AND ("featBIDFKey" = 95) AND ("featAPresent" = true) AND ("featBPresent" = false))"
explain ANALYZE:
"Bitmap Heap Scan on "Inexact2Comb" (cost=4.27..8.29 rows=1 width=974) (actual time=0.621..0.675 rows=1 loops=1)"
" Recheck Cond: (("paramIDFKey" = 53) AND ("featAIDFKey" = 0) AND ("featBIDFKey" = 95))"
" Filter: ("featAPresent" AND (NOT "featBPresent"))"
" -> Bitmap Index Scan on "InexactIndex" (cost=0.00..4.27 rows=1 width=0) (actual time=0.026..0.026 rows=1 loops=1)"
" Index Cond: (("paramIDFKey" = 53) AND ("featAIDFKey" = 0) AND ("featBIDFKey" = 95) AND ("featAPresent" = true) AND ("featBPresent" = false))"
"Total runtime: 13.780 ms"
and the version:
"PostgreSQL 8.4.14, compiled by Visual C++ build 1400, 32-bit"
Is there any suggestion?
First, I am not at all convinced this is a real problem. If 15ms is too long for a single query, you need to start and ask if you are optimizing prematurely and if it is really the bottleneck. If it is, you may want to reconsider how you are using your database. Keep in mind that queries can execute faster than explain analyze suggsts (I have seen some queries run 4x slower under EXPLAIN ANALYZE). So start by profiling your application and look for real bottlenecks.
This being said, if you do find this is a bottleneck you could take a close look at what is indexed. Too many indexes slow down write operations and that is implicated in update queries. This may mean adding a new index with all columns in the update, and removing other indexes as needed. However, I really don't thin you are going to get a lot more out of that query.