Postgresql simple query on moderately large table is extremely slow

Postgresql simple query on moderately large table is extremely slow - postgresql

I have a PostgresSQL DB on Google SQL :
4 vCPUS
15 GB RAM
55 GB SSD
The relevant schema is :
postgres=> \d device;
Table "public.device"
Column | Type | Collation | Nullable | Default
-------------------------+-------------------------+-----------+----------+---------
id | uuid | | not null |
brand | character varying(255) | | |
model | character varying(255) | | |
serialnumber | character varying(255) | | |
[...]
Indexes:
"device_pkey" PRIMARY KEY, btree (id)
[...]
Referenced by:
TABLE "application" CONSTRAINT "fk_application_device_id_device" FOREIGN KEY (device_id) REFERENCES device(id) ON DELETE CASCADE
[...]
postgres=> \d application;
Table "public.application"
Column | Type | Collation | Nullable | Default
----------------------------+------------------------+-----------+----------+---------
id | uuid | | not null |
device_id | uuid | | not null |
packagename | character varying(255) | | |
versionname | character varying(255) | | |
[...]
Indexes:
"application_pkey" PRIMARY KEY, btree (id)
"application_device_id_packagename_key" UNIQUE CONSTRAINT, btree (device_id, packagename)
Foreign-key constraints:
"fk_application_device_id_device" FOREIGN KEY (device_id) REFERENCES device(id) ON DELETE CASCADE
[...]
Volumetry :
device table: 16k rows
application: 3.6M rows
When trying something as simple as:
select count(id) from application;
The query took 900 seconds (sic) to count those 3.6M rows.
Here is the execution plan:
postgres=> explain analyze select count(id) from application;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1245180.18..1245180.19 rows=1 width=8) (actual time=311470.250..311496.933 rows=1 loops=1)
-> Gather (cost=1245179.96..1245180.17 rows=2 width=8) (actual time=311470.225..311496.919 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1244179.96..1244179.97 rows=1 width=8) (actual time=311463.287..311463.289 rows=1 loops=3)
-> Parallel Seq Scan on application (cost=0.00..1234885.77 rows=3717677 width=16) (actual time=79.783..311296.505 rows=1202169 loops=3)
Planning Time: 0.083 ms
Execution Time: 311497.021 ms
(8 rows)
It seems like everything (like keys and indexes) is correctly set, so what could be the reason for this simple query to take so long ?

You have to look deeper to determine the cause:
turn on track_io_timing in the PostgreSQL configuration so that you can see how long I/O takes
use EXPLAIN (ANALYZE, BUFFERS) to see how many 8kB blocks are touched
If there number of blocks is very high, your table is bloated (consists mostly of nothing), and the sequential scan takes so long because it has to read all that empty space. VACUUM (FULL) can help with that.
If the block count is as you would expect, the problem is that your storage is too slow.

Related

PostgreSQL high CPU usage

I have a table as following schema:
location=# \d locations
Table "public.locations"
Column | Type | Modifiers
-----------+--------------------------+--------------------------------------------------------
id | integer | not null default nextval('locations_id_seq'::regclass)
phone | text | not null
longitude | text | not null
latitude | text | not null
date | text | not null
createdAt | timestamp with time zone |
updatedAt | timestamp with time zone |
Indexes:
"locations_pkey" PRIMARY KEY, btree (id)
"createdAt_idx" btree ("createdAt")
"phone_idx" btree (phone)
and it has 14928439 of rows:
location=# select count(*) from locations;
count
----------
14928439
I have a http api for query user latest uploaded coordinate by phone, But it is slowly query with sql: select * from "locations" where "phone" = '15828354860' order by "createdAt" desc limit 1;
And then I EXPLAIN it:
location=# EXPLAIN ANALYZE select * from "locations" where "phone" = '15828354860' order by "createdAt" desc limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..5.22 rows=1 width=74) (actual time=4779.584..4779.584 rows=1 loops=1)
-> Index Scan Backward using "createdAt_idx" on locations (cost=0.43..663339.70 rows=138739 width=74) (actual time=4779.583..4779.583 rows=1 loops=1)
Filter: (phone = '15828354860'::text)
Rows Removed by Filter: 2027962
Planning time: 0.101 ms
Execution time: 4779.612 ms
(6 rows)
it execution 4.7s, how to improve the query speed?

Index is created but I get Seq Scan

PostgreSQL 9.5.7
I have created an index for username.
Now I do:
explain analyze select * from authentification_user where username='Pele';
And I get:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on authentification_user (cost=0.00..1.25 rows=1 width=478) (actual time=0.017..0.019 rows=1 loops=1)
Filter: ((username)::text = 'Pele'::text)
Rows Removed by Filter: 19
Planning time: 0.101 ms
Execution time: 0.040 ms
(5 rows)
The problem is: why it is a Seq Scan? It is supposed to be an Index Scan. Could you give me a kick here?
simple_project=# \d authentification_user
Table "public.authentification_user"
Column | Type | Modifiers
----------+------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('authentification_user_id_seq'::regclass)
username | character varying(10) | not null
pw_hash | character varying(100) | not null
email | character varying(100) | not null
Indexes:
"authentification_user_pkey" PRIMARY KEY, btree (id)
"authentification_user_username_e9ac7af7" btree (username)
"authentification_user_username_e9ac7af7_like" btree (username varchar_pattern_ops)
simple_project=#

There's good Postgres Wiki page about Index vs Seq scan usage explaining this.
In your case, simply there is not enough data to take any benefit of Index Scan.

Speed up this query for creating a materialized view?

I'm using Postgres 9.4 on OSX with 16GB of RAM. I am creating a materialized view on my Postgres tables, but it is tremendously slow to create and I'm wondering if there is any way to speed things up.
This are the underlying tables, showing spending items and organisations:
Table "public.spending_item"
Column | Type | Modifiers
-------------------+-------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('spending_item_id_seq'::regclass)
presentation_code | character varying(15) | not null
presentation_name | character varying(1000) | not null
total_items | integer | not null
actual_cost | double precision | not null
processing_date | date | not null
organisation_id | character varying(9) | not null
Table "public.organisation"
Column | Type | Modifiers
-----------+------------------------+-----------
code | character varying(9) | not null
name | character varying(200) | not null
There are 100m rows in the spending_item table, and around 10,000 in the organisation table.
The materialized view I want to create is as follows - basically it's total spending and total items per month, grouped by by presentation code and organisation.
(I'm creating it because I need queries like "for organisations with codes like 0304%, give me total spending by month and by organisation", or "for the organisation with code 030405, give me total spending by month and by presentation_code". My data only changes once a month, so I think a materialized view makes sense for this.)
Materialized view "public.vw_spending_summary"
Column | Type | Modifiers
-------------------+-------------------------+-----------
processing_date | date |
organsiation_id | character varying(9) |
organisation_name | character varying(200) |
total_cost | double precision |
total_items | double precision |
presentation_name | character varying(1000) |
presentation_code | character varying(15) |
The query I'm using to create the materialized view is as follows:
SELECT spending_item.processing_date,
spending_item.organisation_id,
organisation.name,
spending_item.presentation_code,
spending_item.presentation_name,
SUM(spending_item.total_items) AS items,
SUM(spending_item.actual_cost) AS cost
FROM organisation, spending_item
WHERE spending_item.organisation_id=organisation.code
GROUP BY spending_item.processing_date, spending_item.presentation_name,
spending_item.presentation_code, spending_item.organisation_id,
organisation.name;
The query has been running for many hours and has not completed yet.
This is the result of an EXPLAIN ANALYZE on a more restrictive query (as above, but with WHERE spending_item.organisation_id like '0101%' added):
GroupAggregate (cost=2263051.70..2291072.45 rows=934025 width=86) (actual time=60890.505..61069.808 rows=180 loops=1)
-> Sort (cost=2263051.70..2265386.76 rows=934025 width=86) (actual time=60890.497..60945.558 rows=397717 loops=1)
Sort Key: spending_item.processing_date, spending_item.presentation_name, spending_item.presentation_code, spending_item.organisation_id, organisation.name
Sort Method: external sort Disk: 32240kB
-> Hash Join (cost=23877.71..2125733.64 rows=934025 width=86) (actual time=180.302..42251.826 rows=397717 loops=1)
Hash Cond: ((spending_item.organisation_id)::text = (organisation.code)::text)
-> Bitmap Heap Scan on spending_item (cost=23276.92..2109954.94 rows=934025 width=68) (actual time=178.521..41743.141 rows=397717 loops=1)
Filter: ((organisation_id)::text ~~ '0201%'::text)
-> Bitmap Index Scan on spending_item_organisation_id_varchar_pattern_ops_idx (cost=0.00..23043.41 rows=924684 width=0) (actual time=136.077..136.077 rows=397717 loops=1)
Index Cond: (((organisation_id)::text ~>=~ '0201'::text) AND ((organisation_id)::text ~<~ '0202'::text))
-> Hash (cost=558.13..558.13 rows=3413 width=27) (actual time=1.774..1.774 rows=3413 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 203kB
-> Seq Scan on organisation (cost=0.00..558.13 rows=3413 width=27) (actual time=0.004..0.991 rows=3413 loops=1)
Total runtime: 61089.664 ms
I'd be grateful for any suggestions on speeding this up. I've already set maintenance_work_mem, shared_buffers etc high, so I don't think there's much I can do on the settings front.
Or perhaps it would make more sense to have two different materialized views, for the two types of queries I want to run?
Please let me know if there is any other information I can provide.

Optimizing ORDER BY from joined column

I have a display that lists each user in a city and their last message in a table sorted by the newest messages first:
Users
------
Caleb - Hey what's up?
------
Bill - Is there anything up tonight?
------
Jon - What's up man?
------
Any help optimizing this query below and/or helping figure out where to add indexes would be very helpful.
I could probably denormalize and store last_message_created_at on users but want to avoid that backfill.
Users Table:
Table "public.users"
Column | Type | Modifiers
-------------------+-----------------------------+-----------
id | integer | not null
role | user_role | not null
last_message_id | integer |
city_id | integer |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"ix_users_city_id" btree (city_id)
"ix_users_last_message_id" btree (last_message_id)
Foreign-key constraints:
"messages_last_message_id_fkey" FOREIGN KEY (last_message_id) REFERENCES messages(id)
"users_city_id_fkey" FOREIGN KEY (city_id) REFERENCES cities(id)
Referenced by:
TABLE "messages" CONSTRAINT "messages_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES users(id)
TABLE "messages" CONSTRAINT "messages_to_user_id_fkey" FOREIGN KEY (to_user_id) REFERENCES users(id)
Messages Table:
Table "public.messages"
Column | Type | Modifiers
------------------+-----------------------------+-------------------------------------------------------
id | integer | not null default nextval('messages_id_seq'::regclass)
content | character varying |
from_user_id | integer |
to_user_id | integer |
created_at | timestamp without time zone |
Indexes:
"messages_pkey" PRIMARY KEY, btree (id)
"ix_messages_from_user_id" btree (from_user_id)
"ix_messages_to_user_id" btree (to_user_id)
Foreign-key constraints:
"messages_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES users(id)
"messages_to_user_id_fkey" FOREIGN KEY (to_user_id) REFERENCES users(id)
Referenced by:
TABLE "users" CONSTRAINT "messages_last_message_id_fkey" FOREIGN KEY (last_message_id) REFERENCES messages(id)
Query and Plan:
EXPLAIN ANALYZE
SELECT users.city_id AS users_city_id, users.id AS users_id, users.last_message_id AS users_last_message_id
FROM users
JOIN messages ON messages.id = users.last_message_id
WHERE users.city_id = 1 ORDER BY users.last_message_id DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=951606.67..951811.07 rows=81760 width=12) (actual time=1934.501..1998.216 rows=79454 loops=1)
Sort Key: users.last_message_id
Sort Method: quicksort Memory: 6797kB
-> Nested Loop (cost=1575.71..944935.41 rows=81760 width=12) (actual time=26.784..1817.478 rows=79454 loops=1)
-> Bitmap Heap Scan on users (cost=1575.71..33209.21 rows=84040 width=12) (actual time=26.656..393.749 rows=85348 loops=1)
Recheck Cond: (city_id = 1)
-> Bitmap Index Scan on ix_users_city_id (cost=0.00..1554.70 rows=84040 width=0) (actual time=20.679..20.679 rows=85348 loops=1)
Index Cond: (city_id = 1)
-> Index Only Scan using messages_pkey on messages (cost=0.00..10.84 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=85348)
Index Cond: (id = users.last_message_id)
Heap Fetches: 79454
Total runtime: 2058.134 ms
(12 rows)

You never use the messages table in the query for any other purpose than excluding users without messages so this might be faster:
select
u.city_id as users_city_id,
u.id as users_id,
u.last_message_id as users_last_message_id
from users u
where
u.city_id = 1
and exists (
select 1
from messages
where id = u.last_message_id
)
order by u.last_message_id desc

A slow sql statments , is there any way to optmize it?

Our application has a very slow statement, it takes more than 11 second, so I want to know is there any way to optimize it ?
The SQL statement
SELECT id FROM mapfriends.cell_forum_topic WHERE id in (
SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid )
AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
id
---------
2471959
2382296
1535967
2432006
2367281
2159706
1501759
1549304
2179763
1598043
(10 rows)
Time: 11444.976 ms
Plan
friends=> explain SELECT id FROM friends.cell_forum_topic WHERE id in (
friends(> SELECT topicid FROM friends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid)
friends-> AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1443.15..1443.15 rows=2 width=12)
-> Sort (cost=1443.15..1443.15 rows=2 width=12)
Sort Key: cell_forum_topic.restoretime
-> Nested Loop (cost=1434.28..1443.14 rows=2 width=12)
-> HashAggregate (cost=1434.28..1434.30 rows=2 width=4)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1430.49 rows=1516 width=4)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
(10 rows)
Time: 1.109 ms
Indexes
friends=> \d cell_forum_item
Table "friends.cell_forum_item"
Column | Type | Modifiers
---------+--------------------------------+--------------------------------------------------------------
id | integer | not null default nextval('cell_forum_item_id_seq'::regclass)
topicid | integer | not null
skyid | integer | not null
content | character varying(200) |
addtime | timestamp(0) without time zone | default now()
ischeck | boolean |
Indexes:
"cell_forum_item_pkey" PRIMARY KEY, btree (id)
"cell_forum_item_idx" btree (topicid, skyid)
"cell_forum_item_idx_1" btree (topicid, id)
"cell_forum_item_idx_skyid" btree (skyid)
friends=> \d cell_forum_topic
Table "friends.cell_forum_topic"
Column | Type | Modifiers
-------------+--------------------------------+-------------------------------------------------------------------------------------
-
id | integer | not null default nextval(('"friends"."cell_forum_topic_id_seq"'::text)::regclass)
categoryid | integer | not null
topic | character varying | not null
content | character varying | not null
skyid | integer | not null
addtime | timestamp(0) without time zone | default now()
reference | integer | default 0
restore | integer | default 0
restoretime | timestamp(0) without time zone | default now()
locked | boolean | default false
settop | boolean | default false
hidden | boolean | default false
feature | boolean | default false
picid | integer | default 29249
managerid | integer |
imageid | integer | default 0
pass | boolean | default false
ischeck | boolean |
Indexes:
"cell_forum_topic_pkey" PRIMARY KEY, btree (id)
"idx_cell_forum_topic_1" btree (categoryid, settop, hidden, restoretime, skyid)
"idx_cell_forum_topic_2" btree (categoryid, hidden, restoretime, skyid)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
"idx_cell_forum_topic_4" btree (categoryid, hidden, restore)
"idx_cell_forum_topic_5" btree (categoryid, hidden, restoretime, feature)
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
Explain analyze
mapfriends=> explain analyze SELECT id FROM mapfriends.cell_forum_topic
mapfriends-> join (SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid) as tmp
mapfriends-> on mapfriends.cell_forum_topic.id=tmp.topicid
mapfriends-> where categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.006..18016.013 rows=10 loops=1)
-> Sort (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.001..18016.002 rows=10 loops=1)
Sort Key: cell_forum_topic.restoretime
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1438.02..1446.88 rows=2 width=12) (actual time=16988.492..18015.869 rows=20 loops=1)
-> HashAggregate (cost=1438.02..1438.04 rows=2 width=4) (actual time=15446.735..15447.243 rows=610 loops=1)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1434.22 rows=1520 width=4) (actual time=302.378..15429.782 rows=7133 loops=1)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12) (actual time=4.210..4.210 rows=0 loops=610)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
Total runtime: 18019.461 ms

Could you give us some more information about the tables (the statistics) and the configuration?
SELECT version();
SELECT category, name, setting FROM pg_settings WHERE name IN('effective_cache_size', 'enable_seqscan', 'shared_buffers');
SELECT * FROM pg_stat_user_tables WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stat_user_indexes WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stats WHERE tablename IN('cell_forum_topic', 'cell_forum_item');
And before getting this data, use ANALYZE.
It looks like you have a problem with an index, this is where all the query spends all it's time:
-> Index Scan using cell_forum_item_idx_skyid on
cell_forum_item (cost=0.00..1434.22
rows=1520 width=4) (actual
time=302.378..15429.782 rows=7133
loops=1)
If you use VACUUM FULL on a regular basis (NOT RECOMMENDED!), index bloat might be your problem. A REINDEX might be a good idea, just to be sure:
REINDEX TABLE cell_forum_item;
And talking about indexes, you can drop a couple of them, these are obsolete:
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
Other indexes have the same data and can be used by the database as well.
It looks like you have a couple of problems:
autovacuum is turned off or it's way
behind. That last autovacuum was on
2010-12-02 and you have 256734 dead
tuples in one table and 451430 dead
ones in the other.... You have to do
something about this, this is a
serious problem.
When autovacuum is working again, you
have to do a VACUUM FULL and a
REINDEX to force a table rewrite and
get rid of all empty space in your
tables.
after fixing the vacuum-problem, you
have to analyze as well: the database
expects 1520 results but it gets 7133
results. This could be a problem with
statistics, maybe you have to
increase the STATISTICS.
The query itself needs some rewriting
as well: It gets 7133 results but it
needs only 610 results. Over 90% of
the results are lost... And getting
these 7133 takes a lot of time, over
15 seconds. Get rid of the subquery by using a JOIN without the GROUP BY or use EXISTS, also without the GROUP BY.
But first get autovacuum back on track, before you get new or other problems.

the problem isn't due to lack of caching of the query plan but most likely due to the choice of plan due to lack of appropriate indexes

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgresql simple query on moderately large table is extremely slow - postgresql

Related

PostgreSQL high CPU usage

Index is created but I get Seq Scan

Speed up this query for creating a materialized view?

Optimizing ORDER BY from joined column

A slow sql statments , is there any way to optmize it?

Categories

Resources