I have a display that lists each user in a city and their last message in a table sorted by the newest messages first:
Users
------
Caleb - Hey what's up?
------
Bill - Is there anything up tonight?
------
Jon - What's up man?
------
Any help optimizing this query below and/or helping figure out where to add indexes would be very helpful.
I could probably denormalize and store last_message_created_at on users but want to avoid that backfill.
Users Table:
Table "public.users"
Column | Type | Modifiers
-------------------+-----------------------------+-----------
id | integer | not null
role | user_role | not null
last_message_id | integer |
city_id | integer |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"ix_users_city_id" btree (city_id)
"ix_users_last_message_id" btree (last_message_id)
Foreign-key constraints:
"messages_last_message_id_fkey" FOREIGN KEY (last_message_id) REFERENCES messages(id)
"users_city_id_fkey" FOREIGN KEY (city_id) REFERENCES cities(id)
Referenced by:
TABLE "messages" CONSTRAINT "messages_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES users(id)
TABLE "messages" CONSTRAINT "messages_to_user_id_fkey" FOREIGN KEY (to_user_id) REFERENCES users(id)
Messages Table:
Table "public.messages"
Column | Type | Modifiers
------------------+-----------------------------+-------------------------------------------------------
id | integer | not null default nextval('messages_id_seq'::regclass)
content | character varying |
from_user_id | integer |
to_user_id | integer |
created_at | timestamp without time zone |
Indexes:
"messages_pkey" PRIMARY KEY, btree (id)
"ix_messages_from_user_id" btree (from_user_id)
"ix_messages_to_user_id" btree (to_user_id)
Foreign-key constraints:
"messages_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES users(id)
"messages_to_user_id_fkey" FOREIGN KEY (to_user_id) REFERENCES users(id)
Referenced by:
TABLE "users" CONSTRAINT "messages_last_message_id_fkey" FOREIGN KEY (last_message_id) REFERENCES messages(id)
Query and Plan:
EXPLAIN ANALYZE
SELECT users.city_id AS users_city_id, users.id AS users_id, users.last_message_id AS users_last_message_id
FROM users
JOIN messages ON messages.id = users.last_message_id
WHERE users.city_id = 1 ORDER BY users.last_message_id DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=951606.67..951811.07 rows=81760 width=12) (actual time=1934.501..1998.216 rows=79454 loops=1)
Sort Key: users.last_message_id
Sort Method: quicksort Memory: 6797kB
-> Nested Loop (cost=1575.71..944935.41 rows=81760 width=12) (actual time=26.784..1817.478 rows=79454 loops=1)
-> Bitmap Heap Scan on users (cost=1575.71..33209.21 rows=84040 width=12) (actual time=26.656..393.749 rows=85348 loops=1)
Recheck Cond: (city_id = 1)
-> Bitmap Index Scan on ix_users_city_id (cost=0.00..1554.70 rows=84040 width=0) (actual time=20.679..20.679 rows=85348 loops=1)
Index Cond: (city_id = 1)
-> Index Only Scan using messages_pkey on messages (cost=0.00..10.84 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=85348)
Index Cond: (id = users.last_message_id)
Heap Fetches: 79454
Total runtime: 2058.134 ms
(12 rows)
You never use the messages table in the query for any other purpose than excluding users without messages so this might be faster:
select
u.city_id as users_city_id,
u.id as users_id,
u.last_message_id as users_last_message_id
from users u
where
u.city_id = 1
and exists (
select 1
from messages
where id = u.last_message_id
)
order by u.last_message_id desc
Related
I have simple query that must get 1 record from table with about 14m records:
EXPLAIN ANALYZE SELECT "projects_toolresult"."id",
"projects_toolresult"."tool_id",
"projects_toolresult"."status",
"projects_toolresult"."updated_at",
"projects_toolresult"."created_at" FROM
"projects_toolresult" WHERE
("projects_toolresult"."status" = 1 AND
"projects_toolresult"."tool_id" = 21)
ORDER BY "projects_toolresult"."updated_at"
DESC LIMIT 1;
And it is weird that when I order query by updated_at field my query executes 60 sec.
Limit (cost=0.43..510.94 rows=1 width=151) (actual
time=56754.932..56754.932 rows=0 loops=1)
-> Index Scan using projects_to_updated_266459_idx on projects_toolresult (cost=0.43..1800549.09 rows=3527 width=151) (actual time=56754.930..56754.930 rows=0 loops=1)
Filter: ((status = 1) AND (tool_id = 21))
Rows Removed by Filter: 13709343 Planning time: 0.236 ms Execution time: 56754.968 ms (6 rows)
No matter if it will be ASC or DESC
But if I do ORDER BY RAND() or without order:
Limit (cost=23496.10..23496.10 rows=1 width=151) (actual time=447.532..447.532 rows=0 loops=1)
-> Sort (cost=23496.10..23505.20 rows=3642 width=151) (actual time=447.530..447.530 rows=0 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 25kB
-> Index Scan using projects_toolresult_tool_id_34a3bb16 on projects_toolresult (cost=0.56..23477.89 rows=3642 width=151) (actual time=447.513..447.513 rows=0 loops=1)
Index Cond: (tool_id = 21)
Filter: (status = 1)
Rows Removed by Filter: 6097
Planning time: 0.224 ms
Execution time: 447.571 ms
(10 rows)
It working fast.
I have index on updated_at and status fields(I also tried without too). I did upgrade for default postgres settings, increased values with this generator: https://pgtune.leopard.in.ua/#/
And this is what happens when this queries in action.
Postgres version 9.5
My table and indexes:
id | integer | not null default nextval('projects_toolresult_id_seq'::regclass)
status | smallint | not null
object_id | integer | not null
created_at | timestamp with time zone | not null
content_type_id | integer | not null
tool_id | integer | not null
updated_at | timestamp with time zone | not null
output_data | text | not null
Indexes:
"projects_toolresult_pkey" PRIMARY KEY, btree (id)
"projects_toolresult_content_type_id_object_i_71ee2c2e_uniq" UNIQUE CONSTRAINT, btree (content_type_id, object_id, tool_id)
"projects_to_created_cee389_idx" btree (created_at)
"projects_to_tool_id_ec7856_idx" btree (tool_id, status)
"projects_to_updated_266459_idx" btree (updated_at)
"projects_toolresult_content_type_id_9924d905" btree (content_type_id)
"projects_toolresult_tool_id_34a3bb16" btree (tool_id)
Check constraints:
"projects_toolresult_object_id_check" CHECK (object_id >= 0)
"projects_toolresult_status_check" CHECK (status >= 0)
Foreign-key constraints:
"projects_toolresult_content_type_id_9924d905_fk_django_co" FOREIGN KEY (content_type_id) REFERENCES django_content_type(id) DEFERRABLE INITIALLY DEFERRED
"projects_toolresult_tool_id_34a3bb16_fk_projects_tool_id" FOREIGN KEY (tool_id) REFERENCES projects_tool(id) DEFERRABLE INITIALLY DEFERRED
You are filtering your data on status and tool_id, and sorting on updated_at but you have no single index for all three of those columns.
Add an index, like so:
CREATE INDEX ON projects_toolresult (status, tool_id, updated_at);
PostgreSQL 9.5.7
I have created an index for username.
Now I do:
explain analyze select * from authentification_user where username='Pele';
And I get:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on authentification_user (cost=0.00..1.25 rows=1 width=478) (actual time=0.017..0.019 rows=1 loops=1)
Filter: ((username)::text = 'Pele'::text)
Rows Removed by Filter: 19
Planning time: 0.101 ms
Execution time: 0.040 ms
(5 rows)
The problem is: why it is a Seq Scan? It is supposed to be an Index Scan. Could you give me a kick here?
simple_project=# \d authentification_user
Table "public.authentification_user"
Column | Type | Modifiers
----------+------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('authentification_user_id_seq'::regclass)
username | character varying(10) | not null
pw_hash | character varying(100) | not null
email | character varying(100) | not null
Indexes:
"authentification_user_pkey" PRIMARY KEY, btree (id)
"authentification_user_username_e9ac7af7" btree (username)
"authentification_user_username_e9ac7af7_like" btree (username varchar_pattern_ops)
simple_project=#
There's good Postgres Wiki page about Index vs Seq scan usage explaining this.
In your case, simply there is not enough data to take any benefit of Index Scan.
I'm trying to determine if there a "low cost" optimization for the following query. We've implemented a system whereby 'tickets' earn 'points' and thus can be ranked. In order to support analytical type of queries, we store the rank of every ticket and wether the ticket is tied along with the ticket.
I've found that, at scale, storing the is_tied field is very slow. I'm attempting to run the scenario below on a set of "tickets" that is about 20-75k tickets.
I'm hoping that someone can help identify why and offer some help.
We're on postgres 9.3.6
Here's a simplified ticket table schema:
ogs_1=> \d api_ticket
Table "public.api_ticket"
Column | Type | Modifiers
------------------------------+--------------------------+---------------------------------------------------------
id | integer | not null default nextval('api_ticket_id_seq'::regclass)
status | character varying(3) | not null
points_earned | integer | not null
rank | integer | not null
event_id | integer | not null
user_id | integer | not null
is_tied | boolean | not null
Indexes:
"api_ticket_pkey" PRIMARY KEY, btree (id)
"api_ticket_4437cfac" btree (event_id)
"api_ticket_e8701ad4" btree (user_id)
"api_ticket_points_earned_idx" btree (points_earned)
"api_ticket_rank_idx" btree ("rank")
Foreign-key constraints:
"api_ticket_event_id_598c97289edc0e3e_fk_api_event_id" FOREIGN KEY (event_id) REFERENCES api_event(id) DEFERRABLE INITIALLY DEFERRED
(user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Here's the query that I'm executing:
UPDATE api_ticket t SET is_tied = False
WHERE t.event_id IN (SELECT id FROM api_event WHERE status = 'c');
UPDATE api_ticket t SET is_tied = True
FROM (
SELECT event_id, rank
FROM api_ticket tt
WHERE event_id in (SELECT id FROM api_event WHERE status = 'c')
AND tt.status <> 'x'
GROUP BY rank, event_id
HAVING count(*) > 1
) AS tied_tickets
WHERE t.rank = tied_tickets.rank AND
tied_tickets.event_id = t.event_id;
Here's the explain on a set of about 35k rows:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Update on api_ticket t (cost=3590.01..603570.21 rows=157 width=128)
-> Nested Loop (cost=3590.01..603570.21 rows=157 width=128)
-> Subquery Scan on tied_tickets (cost=2543.31..2556.18 rows=572 width=40)
-> HashAggregate (cost=2543.31..2550.46 rows=572 width=8)
Filter: (count(*) > 1)
-> Nested Loop (cost=0.84..2539.02 rows=572 width=8)
-> Index Scan using api_event_status_idx1 on api_event (cost=0.29..8.31 rows=1 width=4)
Index Cond: ((status)::text = 'c'::text)
-> Index Scan using api_ticket_4437cfac on api_ticket tt (cost=0.55..2524.99 rows=572 width=8)
Index Cond: (event_id = api_event.id)
Filter: ((status)::text <> 'x'::text)
-> Bitmap Heap Scan on api_ticket t (cost=1046.70..1050.71 rows=1 width=92)
Recheck Cond: (("rank" = tied_tickets."rank") AND (event_id = tied_tickets.event_id))
-> BitmapAnd (cost=1046.70..1046.70 rows=1 width=0)
-> Bitmap Index Scan on api_ticket_rank_idx (cost=0.00..26.65 rows=708 width=0)
Index Cond: ("rank" = tied_tickets."rank")
-> Bitmap Index Scan on api_ticket_4437cfac (cost=0.00..1019.79 rows=573 width=0)
Index Cond: (event_id = tied_tickets.event_id)
I'm running Geodjango/Postgres 9.1/PostGIS and I'm trying to get the following query (and others like it) to run faster.
[query snipped for brevity]
SELECT "crowdbreaks_incomingkeyword"."keyword_id"
, COUNT("crowdbreaks_incomingkeyword"."keyword_id") AS "cnt"
FROM "crowdbreaks_incomingkeyword"
INNER JOIN "crowdbreaks_tweet"
ON ("crowdbreaks_incomingkeyword"."tweet_id"
= "crowdbreaks_tweet"."tweet_id")
LEFT OUTER JOIN "crowdbreaks_place"
ON ("crowdbreaks_tweet"."place_id"
= "crowdbreaks_place"."place_id")
WHERE (("crowdbreaks_tweet"."coordinates"
# ST_GeomFromEWKB(E'\\001 ... \\000\\000\\000\\0008#'::bytea)
OR ST_Overlaps("crowdbreaks_place"."bounding_box"
, ST_GeomFromEWKB(E'\\001...00\\000\\0008#'::bytea)
))
AND "crowdbreaks_tweet"."created_at" > E'2012-04-17 15:46:12.109893'
AND "crowdbreaks_tweet"."created_at" < E'2012-04-18 15:46:12.109899' )
GROUP BY "crowdbreaks_incomingkeyword"."keyword_id"
, "crowdbreaks_incomingkeyword"."keyword_id"
;
Here is what the crowdbreaks_tweet table looks like:
\d+ crowdbreaks_tweet;
Table "public.crowdbreaks_tweet"
Column | Type | Modifiers | Storage | Description
---------------+--------------------------+-----------+----------+-------------
tweet_id | bigint | not null | plain |
tweeter | bigint | not null | plain |
text | text | not null | extended |
created_at | timestamp with time zone | not null | plain |
country_code | character varying(3) | | extended |
place_id | character varying(32) | | extended |
coordinates | geometry | | main |
Indexes:
"crowdbreaks_tweet_pkey" PRIMARY KEY, btree (tweet_id)
"crowdbreaks_tweet_coordinates_id" gist (coordinates)
"crowdbreaks_tweet_created_at" btree (created_at)
"crowdbreaks_tweet_place_id" btree (place_id)
"crowdbreaks_tweet_place_id_like" btree (place_id varchar_pattern_ops)
Check constraints:
"enforce_dims_coordinates" CHECK (st_ndims(coordinates) = 2)
"enforce_geotype_coordinates" CHECK (geometrytype(coordinates) = 'POINT'::text OR coordinates IS NULL)
"enforce_srid_coordinates" CHECK (st_srid(coordinates) = 4326)
Foreign-key constraints:
"crowdbreaks_tweet_place_id_fkey" FOREIGN KEY (place_id) REFERENCES crowdbreaks_place(place_id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "crowdbreaks_incomingkeyword" CONSTRAINT "crowdbreaks_incomingkeyword_tweet_id_fkey" FOREIGN KEY (tweet_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED
TABLE "crowdbreaks_tweetanswer" CONSTRAINT "crowdbreaks_tweetanswer_tweet_id_id_fkey" FOREIGN KEY (tweet_id_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no
And here is the explain analyze for the query:
HashAggregate (cost=184022.03..184023.18 rows=115 width=4) (actual time=6381.707..6381.769 rows=62 loops=1)
-> Hash Join (cost=103857.48..183600.24 rows=84357 width=4) (actual time=1745.449..6377.505 rows=3453 loops=1)
Hash Cond: (crowdbreaks_incomingkeyword.tweet_id = crowdbreaks_tweet.tweet_id)
-> Seq Scan on crowdbreaks_incomingkeyword (cost=0.00..36873.97 rows=2252597 width=12) (actual time=0.008..2136.839 rows=2252597 loops=1)
-> Hash (cost=102535.68..102535.68 rows=80544 width=8) (actual time=1744.815..1744.815 rows=3091 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 32kB
-> Hash Left Join (cost=16574.93..102535.68 rows=80544 width=8) (actual time=112.551..1740.651 rows=3091 loops=1)
Hash Cond: ((crowdbreaks_tweet.place_id)::text = (crowdbreaks_place.place_id)::text)
Filter: ((crowdbreaks_tweet.coordinates # '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) OR ((crowdbreaks_place.bounding_box && '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) AND _st_overlaps(crowdbreaks_place.bounding_box, '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry)))
-> Bitmap Heap Scan on crowdbreaks_tweet (cost=15874.18..67060.28 rows=747873 width=125) (actual time=96.012..940.462 rows=736784 loops=1)
Recheck Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone))
-> Bitmap Index Scan on crowdbreaks_tweet_crreated_at (cost=0.00..15687.22 rows=747873 width=0) (actual time=94.259..94.259 rows=736784 loops=1)
Index Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone))
-> Hash (cost=217.11..217.11 rows=6611 width=469) (actual time=15.926..15.926 rows=6611 loops=1)
Buckets: 1024 Batches: 4 Memory Usage: 259kB
-> Seq Scan on crowdbreaks_place (cost=0.00..217.11 rows=6611 width=469) (actual time=0.005..6.908 rows=6611 loops=1)
Total runtime: 6381.903 ms
(17 rows)
That's a pretty bad runtime for the query. Ideally, I'd like to get results back in a second or two.
I've increased shared_buffers on Postgres to 2GB (I have 8GB of RAM) but other than that I'm not quite sure what to do. What are my options? Should I do fewer joins? Are there any other indexes I can throw on there? The sequential scan on crowdbreaks_incomingkeyword doesn't make sense to me. It's a table of foreign keys to other tables, and thus has indexes on it.
Judging from your comment I would try two things:
Raise statistics target for involved columns (and run ANALYZE).
ALTER TABLE tbl ALTER COLUMN column SET STATISTICS 1000;
The data distribution may be uneven. A bigger sample may provide the query planner with more accurate estimates.
Play with the cost settings in postgresql.conf. Your sequential scans might need to be more expensive compared to your index scans to give good estimates.
Try to lower the cost for cpu_index_tuple_cost and set effective_cache_size to something as high as three quaters of your total RAM for a dedicated DB server.
Our application has a very slow statement, it takes more than 11 second, so I want to know is there any way to optimize it ?
The SQL statement
SELECT id FROM mapfriends.cell_forum_topic WHERE id in (
SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid )
AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
id
---------
2471959
2382296
1535967
2432006
2367281
2159706
1501759
1549304
2179763
1598043
(10 rows)
Time: 11444.976 ms
Plan
friends=> explain SELECT id FROM friends.cell_forum_topic WHERE id in (
friends(> SELECT topicid FROM friends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid)
friends-> AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1443.15..1443.15 rows=2 width=12)
-> Sort (cost=1443.15..1443.15 rows=2 width=12)
Sort Key: cell_forum_topic.restoretime
-> Nested Loop (cost=1434.28..1443.14 rows=2 width=12)
-> HashAggregate (cost=1434.28..1434.30 rows=2 width=4)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1430.49 rows=1516 width=4)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
(10 rows)
Time: 1.109 ms
Indexes
friends=> \d cell_forum_item
Table "friends.cell_forum_item"
Column | Type | Modifiers
---------+--------------------------------+--------------------------------------------------------------
id | integer | not null default nextval('cell_forum_item_id_seq'::regclass)
topicid | integer | not null
skyid | integer | not null
content | character varying(200) |
addtime | timestamp(0) without time zone | default now()
ischeck | boolean |
Indexes:
"cell_forum_item_pkey" PRIMARY KEY, btree (id)
"cell_forum_item_idx" btree (topicid, skyid)
"cell_forum_item_idx_1" btree (topicid, id)
"cell_forum_item_idx_skyid" btree (skyid)
friends=> \d cell_forum_topic
Table "friends.cell_forum_topic"
Column | Type | Modifiers
-------------+--------------------------------+-------------------------------------------------------------------------------------
-
id | integer | not null default nextval(('"friends"."cell_forum_topic_id_seq"'::text)::regclass)
categoryid | integer | not null
topic | character varying | not null
content | character varying | not null
skyid | integer | not null
addtime | timestamp(0) without time zone | default now()
reference | integer | default 0
restore | integer | default 0
restoretime | timestamp(0) without time zone | default now()
locked | boolean | default false
settop | boolean | default false
hidden | boolean | default false
feature | boolean | default false
picid | integer | default 29249
managerid | integer |
imageid | integer | default 0
pass | boolean | default false
ischeck | boolean |
Indexes:
"cell_forum_topic_pkey" PRIMARY KEY, btree (id)
"idx_cell_forum_topic_1" btree (categoryid, settop, hidden, restoretime, skyid)
"idx_cell_forum_topic_2" btree (categoryid, hidden, restoretime, skyid)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
"idx_cell_forum_topic_4" btree (categoryid, hidden, restore)
"idx_cell_forum_topic_5" btree (categoryid, hidden, restoretime, feature)
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
Explain analyze
mapfriends=> explain analyze SELECT id FROM mapfriends.cell_forum_topic
mapfriends-> join (SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid) as tmp
mapfriends-> on mapfriends.cell_forum_topic.id=tmp.topicid
mapfriends-> where categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.006..18016.013 rows=10 loops=1)
-> Sort (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.001..18016.002 rows=10 loops=1)
Sort Key: cell_forum_topic.restoretime
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1438.02..1446.88 rows=2 width=12) (actual time=16988.492..18015.869 rows=20 loops=1)
-> HashAggregate (cost=1438.02..1438.04 rows=2 width=4) (actual time=15446.735..15447.243 rows=610 loops=1)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1434.22 rows=1520 width=4) (actual time=302.378..15429.782 rows=7133 loops=1)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12) (actual time=4.210..4.210 rows=0 loops=610)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
Total runtime: 18019.461 ms
Could you give us some more information about the tables (the statistics) and the configuration?
SELECT version();
SELECT category, name, setting FROM pg_settings WHERE name IN('effective_cache_size', 'enable_seqscan', 'shared_buffers');
SELECT * FROM pg_stat_user_tables WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stat_user_indexes WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stats WHERE tablename IN('cell_forum_topic', 'cell_forum_item');
And before getting this data, use ANALYZE.
It looks like you have a problem with an index, this is where all the query spends all it's time:
-> Index Scan using cell_forum_item_idx_skyid on
cell_forum_item (cost=0.00..1434.22
rows=1520 width=4) (actual
time=302.378..15429.782 rows=7133
loops=1)
If you use VACUUM FULL on a regular basis (NOT RECOMMENDED!), index bloat might be your problem. A REINDEX might be a good idea, just to be sure:
REINDEX TABLE cell_forum_item;
And talking about indexes, you can drop a couple of them, these are obsolete:
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
Other indexes have the same data and can be used by the database as well.
It looks like you have a couple of problems:
autovacuum is turned off or it's way
behind. That last autovacuum was on
2010-12-02 and you have 256734 dead
tuples in one table and 451430 dead
ones in the other.... You have to do
something about this, this is a
serious problem.
When autovacuum is working again, you
have to do a VACUUM FULL and a
REINDEX to force a table rewrite and
get rid of all empty space in your
tables.
after fixing the vacuum-problem, you
have to analyze as well: the database
expects 1520 results but it gets 7133
results. This could be a problem with
statistics, maybe you have to
increase the STATISTICS.
The query itself needs some rewriting
as well: It gets 7133 results but it
needs only 610 results. Over 90% of
the results are lost... And getting
these 7133 takes a lot of time, over
15 seconds. Get rid of the subquery by using a JOIN without the GROUP BY or use EXISTS, also without the GROUP BY.
But first get autovacuum back on track, before you get new or other problems.
the problem isn't due to lack of caching of the query plan but most likely due to the choice of plan due to lack of appropriate indexes