Index is created but I get Seq Scan - postgresql

PostgreSQL 9.5.7
I have created an index for username.
Now I do:
explain analyze select * from authentification_user where username='Pele';
And I get:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on authentification_user (cost=0.00..1.25 rows=1 width=478) (actual time=0.017..0.019 rows=1 loops=1)
Filter: ((username)::text = 'Pele'::text)
Rows Removed by Filter: 19
Planning time: 0.101 ms
Execution time: 0.040 ms
(5 rows)
The problem is: why it is a Seq Scan? It is supposed to be an Index Scan. Could you give me a kick here?
simple_project=# \d authentification_user
Table "public.authentification_user"
Column | Type | Modifiers
----------+------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('authentification_user_id_seq'::regclass)
username | character varying(10) | not null
pw_hash | character varying(100) | not null
email | character varying(100) | not null
Indexes:
"authentification_user_pkey" PRIMARY KEY, btree (id)
"authentification_user_username_e9ac7af7" btree (username)
"authentification_user_username_e9ac7af7_like" btree (username varchar_pattern_ops)
simple_project=#

There's good Postgres Wiki page about Index vs Seq scan usage explaining this.
In your case, simply there is not enough data to take any benefit of Index Scan.

Related

Postgresql simple query on moderately large table is extremely slow

I have a PostgresSQL DB on Google SQL :
4 vCPUS
15 GB RAM
55 GB SSD
The relevant schema is :
postgres=> \d device;
Table "public.device"
Column | Type | Collation | Nullable | Default
-------------------------+-------------------------+-----------+----------+---------
id | uuid | | not null |
brand | character varying(255) | | |
model | character varying(255) | | |
serialnumber | character varying(255) | | |
[...]
Indexes:
"device_pkey" PRIMARY KEY, btree (id)
[...]
Referenced by:
TABLE "application" CONSTRAINT "fk_application_device_id_device" FOREIGN KEY (device_id) REFERENCES device(id) ON DELETE CASCADE
[...]
postgres=> \d application;
Table "public.application"
Column | Type | Collation | Nullable | Default
----------------------------+------------------------+-----------+----------+---------
id | uuid | | not null |
device_id | uuid | | not null |
packagename | character varying(255) | | |
versionname | character varying(255) | | |
[...]
Indexes:
"application_pkey" PRIMARY KEY, btree (id)
"application_device_id_packagename_key" UNIQUE CONSTRAINT, btree (device_id, packagename)
Foreign-key constraints:
"fk_application_device_id_device" FOREIGN KEY (device_id) REFERENCES device(id) ON DELETE CASCADE
[...]
Volumetry :
device table: 16k rows
application: 3.6M rows
When trying something as simple as:
select count(id) from application;
The query took 900 seconds (sic) to count those 3.6M rows.
Here is the execution plan:
postgres=> explain analyze select count(id) from application;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1245180.18..1245180.19 rows=1 width=8) (actual time=311470.250..311496.933 rows=1 loops=1)
-> Gather (cost=1245179.96..1245180.17 rows=2 width=8) (actual time=311470.225..311496.919 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1244179.96..1244179.97 rows=1 width=8) (actual time=311463.287..311463.289 rows=1 loops=3)
-> Parallel Seq Scan on application (cost=0.00..1234885.77 rows=3717677 width=16) (actual time=79.783..311296.505 rows=1202169 loops=3)
Planning Time: 0.083 ms
Execution Time: 311497.021 ms
(8 rows)
It seems like everything (like keys and indexes) is correctly set, so what could be the reason for this simple query to take so long ?
You have to look deeper to determine the cause:
turn on track_io_timing in the PostgreSQL configuration so that you can see how long I/O takes
use EXPLAIN (ANALYZE, BUFFERS) to see how many 8kB blocks are touched
If there number of blocks is very high, your table is bloated (consists mostly of nothing), and the sequential scan takes so long because it has to read all that empty space. VACUUM (FULL) can help with that.
If the block count is as you would expect, the problem is that your storage is too slow.

slow order by "field" and limit

I have simple query that must get 1 record from table with about 14m records:
EXPLAIN ANALYZE SELECT "projects_toolresult"."id",
"projects_toolresult"."tool_id",
"projects_toolresult"."status",
"projects_toolresult"."updated_at",
"projects_toolresult"."created_at" FROM
"projects_toolresult" WHERE
("projects_toolresult"."status" = 1 AND
"projects_toolresult"."tool_id" = 21)
ORDER BY "projects_toolresult"."updated_at"
DESC LIMIT 1;
And it is weird that when I order query by updated_at field my query executes 60 sec.
Limit (cost=0.43..510.94 rows=1 width=151) (actual
time=56754.932..56754.932 rows=0 loops=1)
-> Index Scan using projects_to_updated_266459_idx on projects_toolresult (cost=0.43..1800549.09 rows=3527 width=151) (actual time=56754.930..56754.930 rows=0 loops=1)
Filter: ((status = 1) AND (tool_id = 21))
Rows Removed by Filter: 13709343 Planning time: 0.236 ms Execution time: 56754.968 ms (6 rows)
No matter if it will be ASC or DESC
But if I do ORDER BY RAND() or without order:
Limit (cost=23496.10..23496.10 rows=1 width=151) (actual time=447.532..447.532 rows=0 loops=1)
-> Sort (cost=23496.10..23505.20 rows=3642 width=151) (actual time=447.530..447.530 rows=0 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 25kB
-> Index Scan using projects_toolresult_tool_id_34a3bb16 on projects_toolresult (cost=0.56..23477.89 rows=3642 width=151) (actual time=447.513..447.513 rows=0 loops=1)
Index Cond: (tool_id = 21)
Filter: (status = 1)
Rows Removed by Filter: 6097
Planning time: 0.224 ms
Execution time: 447.571 ms
(10 rows)
It working fast.
I have index on updated_at and status fields(I also tried without too). I did upgrade for default postgres settings, increased values with this generator: https://pgtune.leopard.in.ua/#/
And this is what happens when this queries in action.
Postgres version 9.5
My table and indexes:
id | integer | not null default nextval('projects_toolresult_id_seq'::regclass)
status | smallint | not null
object_id | integer | not null
created_at | timestamp with time zone | not null
content_type_id | integer | not null
tool_id | integer | not null
updated_at | timestamp with time zone | not null
output_data | text | not null
Indexes:
"projects_toolresult_pkey" PRIMARY KEY, btree (id)
"projects_toolresult_content_type_id_object_i_71ee2c2e_uniq" UNIQUE CONSTRAINT, btree (content_type_id, object_id, tool_id)
"projects_to_created_cee389_idx" btree (created_at)
"projects_to_tool_id_ec7856_idx" btree (tool_id, status)
"projects_to_updated_266459_idx" btree (updated_at)
"projects_toolresult_content_type_id_9924d905" btree (content_type_id)
"projects_toolresult_tool_id_34a3bb16" btree (tool_id)
Check constraints:
"projects_toolresult_object_id_check" CHECK (object_id >= 0)
"projects_toolresult_status_check" CHECK (status >= 0)
Foreign-key constraints:
"projects_toolresult_content_type_id_9924d905_fk_django_co" FOREIGN KEY (content_type_id) REFERENCES django_content_type(id) DEFERRABLE INITIALLY DEFERRED
"projects_toolresult_tool_id_34a3bb16_fk_projects_tool_id" FOREIGN KEY (tool_id) REFERENCES projects_tool(id) DEFERRABLE INITIALLY DEFERRED
You are filtering your data on status and tool_id, and sorting on updated_at but you have no single index for all three of those columns.
Add an index, like so:
CREATE INDEX ON projects_toolresult (status, tool_id, updated_at);

Efficient DB solution for system task tracking

I'm currently working on a data tracking system. The system is a multiprocess application written in Python and working in the following manner:
every S seconds it selects the N most appropriate tasks from the
database (currently Postgres) and finds data for it
if there's no tasks, it creates N new tasks and returns to (1).
The problem is following - currently I have approx. 80GB of data and 36M of tasks and the queries to the tasks table begin to work slower and slower (its the most populated and the most frequently used table).
The main bottleneck of performance
is the task tracking query:
LOCK TABLE task IN ACCESS EXCLUSIVE MODE;
SELECT * FROM task WHERE line = 1 AND action = ANY(ARRAY['Find', 'Get']) AND (stat IN ('', 'CR1') OR stat = 'ERROR' AND (actiondate <= NOW() OR actiondate IS NULL)) ORDER BY taskid, actiondate, action DESC, idtype, date ASC LIMIT 36;
Table "public.task"
Column | Type | Modifiers
------------+-----------------------------+-------------------------------------------------
number | character varying(16) | not null
date | timestamp without time zone | default now()
stat | character varying(16) | not null default ''::character varying
idtype | character varying(16) | not null default 'container'::character varying
uri | character varying(1024) |
action | character varying(16) | not null default 'Find'::character varying
reason | character varying(4096) | not null default ''::character varying
rev | integer | not null default 0
actiondate | timestamp without time zone |
modifydate | timestamp without time zone |
line | integer |
datasource | character varying(512) |
taskid | character varying(32) |
found | integer | not null default 0
Indexes:
"task_pkey" PRIMARY KEY, btree (idtype, number)
"action_index" btree (action)
"actiondate_index" btree (actiondate)
"date_index" btree (date)
"line_index" btree (line)
"modifydate_index" btree (modifydate)
"stat_index" btree (stat)
"taskid_index" btree (taskid)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=312638.87..312638.96 rows=36 width=668) (actual time=1838.193..1838.197 rows=36 loops=1)
-> Sort (cost=312638.87..313149.54 rows=204267 width=668) (actual time=1838.192..1838.194 rows=36 loops=1)
Sort Key: taskid, actiondate, action, idtype, date
Sort Method: top-N heapsort Memory: 43kB
-> Bitmap Heap Scan on task (cost=107497.61..306337.31 rows=204267 width=668) (actual time=1013.491..1343.751 rows=914586 loops=1)
Recheck Cond: ((((stat)::text = ANY ('{"",CR1}'::text[])) OR ((stat)::text = 'ERROR'::text)) AND (line = 1))
Filter: (((action)::text = ANY ('{Find,Get}'::text[])) AND (((stat)::text = ANY ('{"",CR1}'::text[])) OR (((stat)::text = 'ERROR'::text) AND ((actiondate <= now()) OR (actiondate IS NULL)))))
Rows Removed by Filter: 133
Heap Blocks: exact=76064
-> BitmapAnd (cost=107497.61..107497.61 rows=237348 width=0) (actual time=999.457..999.457 rows=0 loops=1)
-> BitmapOr (cost=9949.15..9949.15 rows=964044 width=0) (actual time=121.936..121.936 rows=0 loops=1)
-> Bitmap Index Scan on stat_index (cost=0.00..9449.46 rows=925379 width=0) (actual time=117.791..117.791 rows=920900 loops=1)
Index Cond: ((stat)::text = ANY ('{"",CR1}'::text[]))
-> Bitmap Index Scan on stat_index (cost=0.00..397.55 rows=38665 width=0) (actual time=4.144..4.144 rows=30262 loops=1)
Index Cond: ((stat)::text = 'ERROR'::text)
-> Bitmap Index Scan on line_index (cost=0.00..97497.14 rows=9519277 width=0) (actual time=853.033..853.033 rows=9605462 loops=1)
Index Cond: (line = 1)
Planning time: 0.284 ms
Execution time: 1838.882 ms
(19 rows)
Of course, all involved fields are indexed. I'm currently thinking in two directions:
how to optimize the query and will it actually give me a performance improvement for perspective or not (currently it takes approx. 10 seconds per query which is inacceptable in dynamic task tracking)
where and how it would be more effective to store the task data - may be I should use another DB for such purposes - Cassandra, VoltDB or another Big Data store?
I think that the data should be somehow preordered to get actual tasks as fast as possible.
And also please keep in mind that my current volume of 80G is most likely a minimum rather than maximum for a such task.
Thanks in advance!
I don't quite understand your use case, but it doesn't look to me like your indexes are working too well. It looks like the query is relying mostly on the stat index. I think you need to look into a composite index something like (action, line, stat).
Another option is to shard your data across multiple tables splitting it on some key with low cardinality. I don't use postgres but I don't think looking at another db solution is going to work better unless you know exactly what you're optimizing for.

Optimizing ORDER BY from joined column

I have a display that lists each user in a city and their last message in a table sorted by the newest messages first:
Users
------
Caleb - Hey what's up?
------
Bill - Is there anything up tonight?
------
Jon - What's up man?
------
Any help optimizing this query below and/or helping figure out where to add indexes would be very helpful.
I could probably denormalize and store last_message_created_at on users but want to avoid that backfill.
Users Table:
Table "public.users"
Column | Type | Modifiers
-------------------+-----------------------------+-----------
id | integer | not null
role | user_role | not null
last_message_id | integer |
city_id | integer |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"ix_users_city_id" btree (city_id)
"ix_users_last_message_id" btree (last_message_id)
Foreign-key constraints:
"messages_last_message_id_fkey" FOREIGN KEY (last_message_id) REFERENCES messages(id)
"users_city_id_fkey" FOREIGN KEY (city_id) REFERENCES cities(id)
Referenced by:
TABLE "messages" CONSTRAINT "messages_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES users(id)
TABLE "messages" CONSTRAINT "messages_to_user_id_fkey" FOREIGN KEY (to_user_id) REFERENCES users(id)
Messages Table:
Table "public.messages"
Column | Type | Modifiers
------------------+-----------------------------+-------------------------------------------------------
id | integer | not null default nextval('messages_id_seq'::regclass)
content | character varying |
from_user_id | integer |
to_user_id | integer |
created_at | timestamp without time zone |
Indexes:
"messages_pkey" PRIMARY KEY, btree (id)
"ix_messages_from_user_id" btree (from_user_id)
"ix_messages_to_user_id" btree (to_user_id)
Foreign-key constraints:
"messages_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES users(id)
"messages_to_user_id_fkey" FOREIGN KEY (to_user_id) REFERENCES users(id)
Referenced by:
TABLE "users" CONSTRAINT "messages_last_message_id_fkey" FOREIGN KEY (last_message_id) REFERENCES messages(id)
Query and Plan:
EXPLAIN ANALYZE
SELECT users.city_id AS users_city_id, users.id AS users_id, users.last_message_id AS users_last_message_id
FROM users
JOIN messages ON messages.id = users.last_message_id
WHERE users.city_id = 1 ORDER BY users.last_message_id DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=951606.67..951811.07 rows=81760 width=12) (actual time=1934.501..1998.216 rows=79454 loops=1)
Sort Key: users.last_message_id
Sort Method: quicksort Memory: 6797kB
-> Nested Loop (cost=1575.71..944935.41 rows=81760 width=12) (actual time=26.784..1817.478 rows=79454 loops=1)
-> Bitmap Heap Scan on users (cost=1575.71..33209.21 rows=84040 width=12) (actual time=26.656..393.749 rows=85348 loops=1)
Recheck Cond: (city_id = 1)
-> Bitmap Index Scan on ix_users_city_id (cost=0.00..1554.70 rows=84040 width=0) (actual time=20.679..20.679 rows=85348 loops=1)
Index Cond: (city_id = 1)
-> Index Only Scan using messages_pkey on messages (cost=0.00..10.84 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=85348)
Index Cond: (id = users.last_message_id)
Heap Fetches: 79454
Total runtime: 2058.134 ms
(12 rows)
You never use the messages table in the query for any other purpose than excluding users without messages so this might be faster:
select
u.city_id as users_city_id,
u.id as users_id,
u.last_message_id as users_last_message_id
from users u
where
u.city_id = 1
and exists (
select 1
from messages
where id = u.last_message_id
)
order by u.last_message_id desc

A slow sql statments , is there any way to optmize it?

Our application has a very slow statement, it takes more than 11 second, so I want to know is there any way to optimize it ?
The SQL statement
SELECT id FROM mapfriends.cell_forum_topic WHERE id in (
SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid )
AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
id
---------
2471959
2382296
1535967
2432006
2367281
2159706
1501759
1549304
2179763
1598043
(10 rows)
Time: 11444.976 ms
Plan
friends=> explain SELECT id FROM friends.cell_forum_topic WHERE id in (
friends(> SELECT topicid FROM friends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid)
friends-> AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1443.15..1443.15 rows=2 width=12)
-> Sort (cost=1443.15..1443.15 rows=2 width=12)
Sort Key: cell_forum_topic.restoretime
-> Nested Loop (cost=1434.28..1443.14 rows=2 width=12)
-> HashAggregate (cost=1434.28..1434.30 rows=2 width=4)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1430.49 rows=1516 width=4)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
(10 rows)
Time: 1.109 ms
Indexes
friends=> \d cell_forum_item
Table "friends.cell_forum_item"
Column | Type | Modifiers
---------+--------------------------------+--------------------------------------------------------------
id | integer | not null default nextval('cell_forum_item_id_seq'::regclass)
topicid | integer | not null
skyid | integer | not null
content | character varying(200) |
addtime | timestamp(0) without time zone | default now()
ischeck | boolean |
Indexes:
"cell_forum_item_pkey" PRIMARY KEY, btree (id)
"cell_forum_item_idx" btree (topicid, skyid)
"cell_forum_item_idx_1" btree (topicid, id)
"cell_forum_item_idx_skyid" btree (skyid)
friends=> \d cell_forum_topic
Table "friends.cell_forum_topic"
Column | Type | Modifiers
-------------+--------------------------------+-------------------------------------------------------------------------------------
-
id | integer | not null default nextval(('"friends"."cell_forum_topic_id_seq"'::text)::regclass)
categoryid | integer | not null
topic | character varying | not null
content | character varying | not null
skyid | integer | not null
addtime | timestamp(0) without time zone | default now()
reference | integer | default 0
restore | integer | default 0
restoretime | timestamp(0) without time zone | default now()
locked | boolean | default false
settop | boolean | default false
hidden | boolean | default false
feature | boolean | default false
picid | integer | default 29249
managerid | integer |
imageid | integer | default 0
pass | boolean | default false
ischeck | boolean |
Indexes:
"cell_forum_topic_pkey" PRIMARY KEY, btree (id)
"idx_cell_forum_topic_1" btree (categoryid, settop, hidden, restoretime, skyid)
"idx_cell_forum_topic_2" btree (categoryid, hidden, restoretime, skyid)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
"idx_cell_forum_topic_4" btree (categoryid, hidden, restore)
"idx_cell_forum_topic_5" btree (categoryid, hidden, restoretime, feature)
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
Explain analyze
mapfriends=> explain analyze SELECT id FROM mapfriends.cell_forum_topic
mapfriends-> join (SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid) as tmp
mapfriends-> on mapfriends.cell_forum_topic.id=tmp.topicid
mapfriends-> where categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.006..18016.013 rows=10 loops=1)
-> Sort (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.001..18016.002 rows=10 loops=1)
Sort Key: cell_forum_topic.restoretime
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1438.02..1446.88 rows=2 width=12) (actual time=16988.492..18015.869 rows=20 loops=1)
-> HashAggregate (cost=1438.02..1438.04 rows=2 width=4) (actual time=15446.735..15447.243 rows=610 loops=1)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1434.22 rows=1520 width=4) (actual time=302.378..15429.782 rows=7133 loops=1)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12) (actual time=4.210..4.210 rows=0 loops=610)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
Total runtime: 18019.461 ms
Could you give us some more information about the tables (the statistics) and the configuration?
SELECT version();
SELECT category, name, setting FROM pg_settings WHERE name IN('effective_cache_size', 'enable_seqscan', 'shared_buffers');
SELECT * FROM pg_stat_user_tables WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stat_user_indexes WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stats WHERE tablename IN('cell_forum_topic', 'cell_forum_item');
And before getting this data, use ANALYZE.
It looks like you have a problem with an index, this is where all the query spends all it's time:
-> Index Scan using cell_forum_item_idx_skyid on
cell_forum_item (cost=0.00..1434.22
rows=1520 width=4) (actual
time=302.378..15429.782 rows=7133
loops=1)
If you use VACUUM FULL on a regular basis (NOT RECOMMENDED!), index bloat might be your problem. A REINDEX might be a good idea, just to be sure:
REINDEX TABLE cell_forum_item;
And talking about indexes, you can drop a couple of them, these are obsolete:
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
Other indexes have the same data and can be used by the database as well.
It looks like you have a couple of problems:
autovacuum is turned off or it's way
behind. That last autovacuum was on
2010-12-02 and you have 256734 dead
tuples in one table and 451430 dead
ones in the other.... You have to do
something about this, this is a
serious problem.
When autovacuum is working again, you
have to do a VACUUM FULL and a
REINDEX to force a table rewrite and
get rid of all empty space in your
tables.
after fixing the vacuum-problem, you
have to analyze as well: the database
expects 1520 results but it gets 7133
results. This could be a problem with
statistics, maybe you have to
increase the STATISTICS.
The query itself needs some rewriting
as well: It gets 7133 results but it
needs only 610 results. Over 90% of
the results are lost... And getting
these 7133 takes a lot of time, over
15 seconds. Get rid of the subquery by using a JOIN without the GROUP BY or use EXISTS, also without the GROUP BY.
But first get autovacuum back on track, before you get new or other problems.
the problem isn't due to lack of caching of the query plan but most likely due to the choice of plan due to lack of appropriate indexes