I have a table as following schema:
location=# \d locations
Table "public.locations"
Column | Type | Modifiers
-----------+--------------------------+--------------------------------------------------------
id | integer | not null default nextval('locations_id_seq'::regclass)
phone | text | not null
longitude | text | not null
latitude | text | not null
date | text | not null
createdAt | timestamp with time zone |
updatedAt | timestamp with time zone |
Indexes:
"locations_pkey" PRIMARY KEY, btree (id)
"createdAt_idx" btree ("createdAt")
"phone_idx" btree (phone)
and it has 14928439 of rows:
location=# select count(*) from locations;
count
----------
14928439
I have a http api for query user latest uploaded coordinate by phone, But it is slowly query with sql: select * from "locations" where "phone" = '15828354860' order by "createdAt" desc limit 1;
And then I EXPLAIN it:
location=# EXPLAIN ANALYZE select * from "locations" where "phone" = '15828354860' order by "createdAt" desc limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..5.22 rows=1 width=74) (actual time=4779.584..4779.584 rows=1 loops=1)
-> Index Scan Backward using "createdAt_idx" on locations (cost=0.43..663339.70 rows=138739 width=74) (actual time=4779.583..4779.583 rows=1 loops=1)
Filter: (phone = '15828354860'::text)
Rows Removed by Filter: 2027962
Planning time: 0.101 ms
Execution time: 4779.612 ms
(6 rows)
it execution 4.7s, how to improve the query speed?
Related
I have a PostgresSQL DB on Google SQL :
4 vCPUS
15 GB RAM
55 GB SSD
The relevant schema is :
postgres=> \d device;
Table "public.device"
Column | Type | Collation | Nullable | Default
-------------------------+-------------------------+-----------+----------+---------
id | uuid | | not null |
brand | character varying(255) | | |
model | character varying(255) | | |
serialnumber | character varying(255) | | |
[...]
Indexes:
"device_pkey" PRIMARY KEY, btree (id)
[...]
Referenced by:
TABLE "application" CONSTRAINT "fk_application_device_id_device" FOREIGN KEY (device_id) REFERENCES device(id) ON DELETE CASCADE
[...]
postgres=> \d application;
Table "public.application"
Column | Type | Collation | Nullable | Default
----------------------------+------------------------+-----------+----------+---------
id | uuid | | not null |
device_id | uuid | | not null |
packagename | character varying(255) | | |
versionname | character varying(255) | | |
[...]
Indexes:
"application_pkey" PRIMARY KEY, btree (id)
"application_device_id_packagename_key" UNIQUE CONSTRAINT, btree (device_id, packagename)
Foreign-key constraints:
"fk_application_device_id_device" FOREIGN KEY (device_id) REFERENCES device(id) ON DELETE CASCADE
[...]
Volumetry :
device table: 16k rows
application: 3.6M rows
When trying something as simple as:
select count(id) from application;
The query took 900 seconds (sic) to count those 3.6M rows.
Here is the execution plan:
postgres=> explain analyze select count(id) from application;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1245180.18..1245180.19 rows=1 width=8) (actual time=311470.250..311496.933 rows=1 loops=1)
-> Gather (cost=1245179.96..1245180.17 rows=2 width=8) (actual time=311470.225..311496.919 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1244179.96..1244179.97 rows=1 width=8) (actual time=311463.287..311463.289 rows=1 loops=3)
-> Parallel Seq Scan on application (cost=0.00..1234885.77 rows=3717677 width=16) (actual time=79.783..311296.505 rows=1202169 loops=3)
Planning Time: 0.083 ms
Execution Time: 311497.021 ms
(8 rows)
It seems like everything (like keys and indexes) is correctly set, so what could be the reason for this simple query to take so long ?
You have to look deeper to determine the cause:
turn on track_io_timing in the PostgreSQL configuration so that you can see how long I/O takes
use EXPLAIN (ANALYZE, BUFFERS) to see how many 8kB blocks are touched
If there number of blocks is very high, your table is bloated (consists mostly of nothing), and the sequential scan takes so long because it has to read all that empty space. VACUUM (FULL) can help with that.
If the block count is as you would expect, the problem is that your storage is too slow.
I have the following table in my database:
business_db_dev=# \d schedules2
Table "public.schedules2"
Column | Type | Collation | Nullable | Default
-------------+--------------------------------+-----------+----------+----------------------------------------
id | bigint | | not null | nextval('schedules2_id_seq'::regclass)
monday | boolean | | not null |
tuesday | boolean | | not null |
wednesday | boolean | | not null |
thursday | boolean | | not null |
friday | boolean | | not null |
saturday | boolean | | not null |
sunday | boolean | | not null |
start1 | time(0) without time zone | | |
end1 | time(0) without time zone | | |
start2 | time(0) without time zone | | |
end2 | time(0) without time zone | | |
user_id | bigint | | not null |
inserted_at | timestamp(0) without time zone | | not null |
updated_at | timestamp(0) without time zone | | not null |
Indexes:
"schedules2_pkey" PRIMARY KEY, btree (id)
"schedules2_start1_end1_DESC_NULLS_LAST_index" btree (start1, end1 DESC NULLS LAST)
"schedules2_start2_end2_DESC_NULLS_LAST_index" btree (start2, end2 DESC NULLS LAST)
"schedules2_user_id_index" UNIQUE, btree (user_id)
Foreign-key constraints:
"schedules2_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
I also have other tables that I use to do a join with that one (users and strategies) which I will not post here for brevity, but if it is needed you can just ask and I will update the question with their structures too.
Giving this table, I'm trying to do the following query
select u.token
from strategies as st
inner join users as u on (st.user_id = u.id)
inner join schedules2 as sc on (st.user_id = sc.user_id)
where st.exchange = 'binance'
and st.market_pair = 'btc_usdt'
and st.timeframe = 'five_minutes'
and st.name = 'stoch_oscillator'
and st.inputs = '{5,3,3,80,20}'
and (sc.start1 is null or ('13:00:01'::time between sc.start1 and sc.end1) or ('13:00:01'::time between sc.start2 and sc.end2));
Running this query with explain analyze I got this result:
Nested Loop (cost=1.27..215.56 rows=16 width=6) (actual time=0.076..6.050 rows=942 loops=1)
Join Filter: (st.user_id = u.id)
-> Nested Loop (cost=0.98..197.89 rows=17 width=16) (actual time=0.070..3.650 rows=942 loops=1)
-> Index Only Scan using unique_strategy_and_user_id on strategies st (cost=0.69..7.29 rows=80 width=8) (actual time=0.056..1.083 rows=1000 loops=1)
Index Cond: ((exchange = 'binance'::text) AND (market_pair = 'btc_usdt'::text) AND (timeframe = 'five_minutes'::text) AND (name = 'stoch_oscillator'::text) AND (inputs = '{5,3,3,80,20}'::character varying[]))
Heap Fetches: 0
-> Index Scan using schedules2_user_id_index on schedules2 sc (cost=0.29..2.38 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (user_id = st.user_id)
Filter: ((start1 IS NULL) OR (('13:00:01'::time without time zone >= start1) AND ('13:00:01'::time without time zone <= end1)) OR (('13:00:01'::time without time zone >= start2) AND ('13:00:01'::time without time zone <= end2)))
Rows Removed by Filter: 0
-> Index Scan using users_pkey on users u (cost=0.29..1.03 rows=1 width=14) (actual time=0.002..0.002 rows=1 loops=942)
Index Cond: (id = sc.user_id)
Planning Time: 0.834 ms
Execution Time: 6.130 ms
The important part is this one:
Index Scan using schedules2_user_id_index on schedules2 sc (cost=0.29..2.38 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (user_id = st.user_id)
Filter: ((start1 IS NULL) OR (('13:00:01'::time without time zone >= start1) AND ('13:00:01'::time without time zone <= end1)) OR (('13:00:01'::time without time zone >= start2) AND ('13:00:01'::time without time zone <= end2)))
As you can see, Postgres is using Filter to check the values start1, end1, start2 and end2, but I was expecting that Postgres would use the two indexes I created for this exact condition:
"schedules2_start1_end1_DESC_NULLS_LAST_index" btree (start1, end1 DESC NULLS LAST)
"schedules2_start2_end2_DESC_NULLS_LAST_index" btree (start2, end2 DESC NULLS LAST)
Removing the join with schedules2 table and its condition basically halves the query time.
So, my question is, why is Postgres using Filter instead of my indexes, and how can I change the query or the indexes itself to optimize this query?
Edit: Note that the values used in the query (like '13:00:01'::time) are just examples, in my system this can be anything.
Index Cond: (user_id = st.user_id)
Filter: ....
Rows Removed by Filter: 0
The index it is already using is already perfect. It found no extra rows which had to be removed by the filter (or at least, so few that they rounded to zero). How could that be improved upon by using more indexes?
I have simple query that must get 1 record from table with about 14m records:
EXPLAIN ANALYZE SELECT "projects_toolresult"."id",
"projects_toolresult"."tool_id",
"projects_toolresult"."status",
"projects_toolresult"."updated_at",
"projects_toolresult"."created_at" FROM
"projects_toolresult" WHERE
("projects_toolresult"."status" = 1 AND
"projects_toolresult"."tool_id" = 21)
ORDER BY "projects_toolresult"."updated_at"
DESC LIMIT 1;
And it is weird that when I order query by updated_at field my query executes 60 sec.
Limit (cost=0.43..510.94 rows=1 width=151) (actual
time=56754.932..56754.932 rows=0 loops=1)
-> Index Scan using projects_to_updated_266459_idx on projects_toolresult (cost=0.43..1800549.09 rows=3527 width=151) (actual time=56754.930..56754.930 rows=0 loops=1)
Filter: ((status = 1) AND (tool_id = 21))
Rows Removed by Filter: 13709343 Planning time: 0.236 ms Execution time: 56754.968 ms (6 rows)
No matter if it will be ASC or DESC
But if I do ORDER BY RAND() or without order:
Limit (cost=23496.10..23496.10 rows=1 width=151) (actual time=447.532..447.532 rows=0 loops=1)
-> Sort (cost=23496.10..23505.20 rows=3642 width=151) (actual time=447.530..447.530 rows=0 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 25kB
-> Index Scan using projects_toolresult_tool_id_34a3bb16 on projects_toolresult (cost=0.56..23477.89 rows=3642 width=151) (actual time=447.513..447.513 rows=0 loops=1)
Index Cond: (tool_id = 21)
Filter: (status = 1)
Rows Removed by Filter: 6097
Planning time: 0.224 ms
Execution time: 447.571 ms
(10 rows)
It working fast.
I have index on updated_at and status fields(I also tried without too). I did upgrade for default postgres settings, increased values with this generator: https://pgtune.leopard.in.ua/#/
And this is what happens when this queries in action.
Postgres version 9.5
My table and indexes:
id | integer | not null default nextval('projects_toolresult_id_seq'::regclass)
status | smallint | not null
object_id | integer | not null
created_at | timestamp with time zone | not null
content_type_id | integer | not null
tool_id | integer | not null
updated_at | timestamp with time zone | not null
output_data | text | not null
Indexes:
"projects_toolresult_pkey" PRIMARY KEY, btree (id)
"projects_toolresult_content_type_id_object_i_71ee2c2e_uniq" UNIQUE CONSTRAINT, btree (content_type_id, object_id, tool_id)
"projects_to_created_cee389_idx" btree (created_at)
"projects_to_tool_id_ec7856_idx" btree (tool_id, status)
"projects_to_updated_266459_idx" btree (updated_at)
"projects_toolresult_content_type_id_9924d905" btree (content_type_id)
"projects_toolresult_tool_id_34a3bb16" btree (tool_id)
Check constraints:
"projects_toolresult_object_id_check" CHECK (object_id >= 0)
"projects_toolresult_status_check" CHECK (status >= 0)
Foreign-key constraints:
"projects_toolresult_content_type_id_9924d905_fk_django_co" FOREIGN KEY (content_type_id) REFERENCES django_content_type(id) DEFERRABLE INITIALLY DEFERRED
"projects_toolresult_tool_id_34a3bb16_fk_projects_tool_id" FOREIGN KEY (tool_id) REFERENCES projects_tool(id) DEFERRABLE INITIALLY DEFERRED
You are filtering your data on status and tool_id, and sorting on updated_at but you have no single index for all three of those columns.
Add an index, like so:
CREATE INDEX ON projects_toolresult (status, tool_id, updated_at);
I have a simple table in a PostgreSQL 9.0.3 database that holds data polled from a wind turbine controller. Each row represents the value of a particular sensor at a particular time. Currently the table has around 90M rows:
wtdata=> \d integer_data
Table "public.integer_data"
Column | Type | Modifiers
--------+--------------------------+-----------
date | timestamp with time zone | not null
name | character varying(64) | not null
value | integer | not null
Indexes:
"integer_data_pkey" PRIMARY KEY, btree (date, name)
"integer_data_date_idx" btree (date)
"integer_data_name_idx" btree (name)
One query that I need is to find the last time that a variable was updated:
select max(date) from integer_data where name = '<name of variable>';
This query works fine when searching for a variable that exists in the table:
wtdata=> select max(date) from integer_data where name = 'STATUS_OF_OUTPUTS_UINT16';
max
------------------------
2011-04-11 02:01:40-05
(1 row)
However, if I try and search for a variable that doesn't exist in the table, the query hangs (or takes longer than I have patience for):
select max(date) from integer_data where name = 'Message';
I've let the query run for hours and sometimes days with no end in sight. There are no rows in the table with name = 'Message':
wtdata=> select count(*) from integer_data where name = 'Message';
count
-------
0
(1 row)
I don't understand why one query is fast and the other takes forever. Is the query somehow being forced to scan the entire table for some reason?
wtdata=> explain select max(date) from integer_data where name = 'Message';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Result (cost=13.67..13.68 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..13.67 rows=1 width=8)
-> Index Scan Backward using integer_data_pkey on integer_data (cost=0.00..6362849.53 rows=465452 width=8)
Index Cond: ((date IS NOT NULL) AND ((name)::text = 'Message'::text))
(5 rows)
Here's the query plan for a fast query:
wtdata=> explain select max(date) from integer_data where name = 'STATUS_OF_OUTPUTS_UINT16';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Result (cost=4.64..4.65 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..4.64 rows=1 width=8)
-> Index Scan Backward using integer_data_pkey on integer_data (cost=0.00..16988170.38 rows=3659570 width=8)
Index Cond: ((date IS NOT NULL) AND ((name)::text = 'STATUS_OF_OUTPUTS_UINT16'::text))
(5 rows)
Change the primary key to (name,date).
Our application has a very slow statement, it takes more than 11 second, so I want to know is there any way to optimize it ?
The SQL statement
SELECT id FROM mapfriends.cell_forum_topic WHERE id in (
SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid )
AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
id
---------
2471959
2382296
1535967
2432006
2367281
2159706
1501759
1549304
2179763
1598043
(10 rows)
Time: 11444.976 ms
Plan
friends=> explain SELECT id FROM friends.cell_forum_topic WHERE id in (
friends(> SELECT topicid FROM friends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid)
friends-> AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1443.15..1443.15 rows=2 width=12)
-> Sort (cost=1443.15..1443.15 rows=2 width=12)
Sort Key: cell_forum_topic.restoretime
-> Nested Loop (cost=1434.28..1443.14 rows=2 width=12)
-> HashAggregate (cost=1434.28..1434.30 rows=2 width=4)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1430.49 rows=1516 width=4)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
(10 rows)
Time: 1.109 ms
Indexes
friends=> \d cell_forum_item
Table "friends.cell_forum_item"
Column | Type | Modifiers
---------+--------------------------------+--------------------------------------------------------------
id | integer | not null default nextval('cell_forum_item_id_seq'::regclass)
topicid | integer | not null
skyid | integer | not null
content | character varying(200) |
addtime | timestamp(0) without time zone | default now()
ischeck | boolean |
Indexes:
"cell_forum_item_pkey" PRIMARY KEY, btree (id)
"cell_forum_item_idx" btree (topicid, skyid)
"cell_forum_item_idx_1" btree (topicid, id)
"cell_forum_item_idx_skyid" btree (skyid)
friends=> \d cell_forum_topic
Table "friends.cell_forum_topic"
Column | Type | Modifiers
-------------+--------------------------------+-------------------------------------------------------------------------------------
-
id | integer | not null default nextval(('"friends"."cell_forum_topic_id_seq"'::text)::regclass)
categoryid | integer | not null
topic | character varying | not null
content | character varying | not null
skyid | integer | not null
addtime | timestamp(0) without time zone | default now()
reference | integer | default 0
restore | integer | default 0
restoretime | timestamp(0) without time zone | default now()
locked | boolean | default false
settop | boolean | default false
hidden | boolean | default false
feature | boolean | default false
picid | integer | default 29249
managerid | integer |
imageid | integer | default 0
pass | boolean | default false
ischeck | boolean |
Indexes:
"cell_forum_topic_pkey" PRIMARY KEY, btree (id)
"idx_cell_forum_topic_1" btree (categoryid, settop, hidden, restoretime, skyid)
"idx_cell_forum_topic_2" btree (categoryid, hidden, restoretime, skyid)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
"idx_cell_forum_topic_4" btree (categoryid, hidden, restore)
"idx_cell_forum_topic_5" btree (categoryid, hidden, restoretime, feature)
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
Explain analyze
mapfriends=> explain analyze SELECT id FROM mapfriends.cell_forum_topic
mapfriends-> join (SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid) as tmp
mapfriends-> on mapfriends.cell_forum_topic.id=tmp.topicid
mapfriends-> where categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.006..18016.013 rows=10 loops=1)
-> Sort (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.001..18016.002 rows=10 loops=1)
Sort Key: cell_forum_topic.restoretime
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1438.02..1446.88 rows=2 width=12) (actual time=16988.492..18015.869 rows=20 loops=1)
-> HashAggregate (cost=1438.02..1438.04 rows=2 width=4) (actual time=15446.735..15447.243 rows=610 loops=1)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1434.22 rows=1520 width=4) (actual time=302.378..15429.782 rows=7133 loops=1)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12) (actual time=4.210..4.210 rows=0 loops=610)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
Total runtime: 18019.461 ms
Could you give us some more information about the tables (the statistics) and the configuration?
SELECT version();
SELECT category, name, setting FROM pg_settings WHERE name IN('effective_cache_size', 'enable_seqscan', 'shared_buffers');
SELECT * FROM pg_stat_user_tables WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stat_user_indexes WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stats WHERE tablename IN('cell_forum_topic', 'cell_forum_item');
And before getting this data, use ANALYZE.
It looks like you have a problem with an index, this is where all the query spends all it's time:
-> Index Scan using cell_forum_item_idx_skyid on
cell_forum_item (cost=0.00..1434.22
rows=1520 width=4) (actual
time=302.378..15429.782 rows=7133
loops=1)
If you use VACUUM FULL on a regular basis (NOT RECOMMENDED!), index bloat might be your problem. A REINDEX might be a good idea, just to be sure:
REINDEX TABLE cell_forum_item;
And talking about indexes, you can drop a couple of them, these are obsolete:
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
Other indexes have the same data and can be used by the database as well.
It looks like you have a couple of problems:
autovacuum is turned off or it's way
behind. That last autovacuum was on
2010-12-02 and you have 256734 dead
tuples in one table and 451430 dead
ones in the other.... You have to do
something about this, this is a
serious problem.
When autovacuum is working again, you
have to do a VACUUM FULL and a
REINDEX to force a table rewrite and
get rid of all empty space in your
tables.
after fixing the vacuum-problem, you
have to analyze as well: the database
expects 1520 results but it gets 7133
results. This could be a problem with
statistics, maybe you have to
increase the STATISTICS.
The query itself needs some rewriting
as well: It gets 7133 results but it
needs only 610 results. Over 90% of
the results are lost... And getting
these 7133 takes a lot of time, over
15 seconds. Get rid of the subquery by using a JOIN without the GROUP BY or use EXISTS, also without the GROUP BY.
But first get autovacuum back on track, before you get new or other problems.
the problem isn't due to lack of caching of the query plan but most likely due to the choice of plan due to lack of appropriate indexes