I have simple query that must get 1 record from table with about 14m records:
EXPLAIN ANALYZE SELECT "projects_toolresult"."id",
"projects_toolresult"."tool_id",
"projects_toolresult"."status",
"projects_toolresult"."updated_at",
"projects_toolresult"."created_at" FROM
"projects_toolresult" WHERE
("projects_toolresult"."status" = 1 AND
"projects_toolresult"."tool_id" = 21)
ORDER BY "projects_toolresult"."updated_at"
DESC LIMIT 1;
And it is weird that when I order query by updated_at field my query executes 60 sec.
Limit (cost=0.43..510.94 rows=1 width=151) (actual
time=56754.932..56754.932 rows=0 loops=1)
-> Index Scan using projects_to_updated_266459_idx on projects_toolresult (cost=0.43..1800549.09 rows=3527 width=151) (actual time=56754.930..56754.930 rows=0 loops=1)
Filter: ((status = 1) AND (tool_id = 21))
Rows Removed by Filter: 13709343 Planning time: 0.236 ms Execution time: 56754.968 ms (6 rows)
No matter if it will be ASC or DESC
But if I do ORDER BY RAND() or without order:
Limit (cost=23496.10..23496.10 rows=1 width=151) (actual time=447.532..447.532 rows=0 loops=1)
-> Sort (cost=23496.10..23505.20 rows=3642 width=151) (actual time=447.530..447.530 rows=0 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 25kB
-> Index Scan using projects_toolresult_tool_id_34a3bb16 on projects_toolresult (cost=0.56..23477.89 rows=3642 width=151) (actual time=447.513..447.513 rows=0 loops=1)
Index Cond: (tool_id = 21)
Filter: (status = 1)
Rows Removed by Filter: 6097
Planning time: 0.224 ms
Execution time: 447.571 ms
(10 rows)
It working fast.
I have index on updated_at and status fields(I also tried without too). I did upgrade for default postgres settings, increased values with this generator: https://pgtune.leopard.in.ua/#/
And this is what happens when this queries in action.
Postgres version 9.5
My table and indexes:
id | integer | not null default nextval('projects_toolresult_id_seq'::regclass)
status | smallint | not null
object_id | integer | not null
created_at | timestamp with time zone | not null
content_type_id | integer | not null
tool_id | integer | not null
updated_at | timestamp with time zone | not null
output_data | text | not null
Indexes:
"projects_toolresult_pkey" PRIMARY KEY, btree (id)
"projects_toolresult_content_type_id_object_i_71ee2c2e_uniq" UNIQUE CONSTRAINT, btree (content_type_id, object_id, tool_id)
"projects_to_created_cee389_idx" btree (created_at)
"projects_to_tool_id_ec7856_idx" btree (tool_id, status)
"projects_to_updated_266459_idx" btree (updated_at)
"projects_toolresult_content_type_id_9924d905" btree (content_type_id)
"projects_toolresult_tool_id_34a3bb16" btree (tool_id)
Check constraints:
"projects_toolresult_object_id_check" CHECK (object_id >= 0)
"projects_toolresult_status_check" CHECK (status >= 0)
Foreign-key constraints:
"projects_toolresult_content_type_id_9924d905_fk_django_co" FOREIGN KEY (content_type_id) REFERENCES django_content_type(id) DEFERRABLE INITIALLY DEFERRED
"projects_toolresult_tool_id_34a3bb16_fk_projects_tool_id" FOREIGN KEY (tool_id) REFERENCES projects_tool(id) DEFERRABLE INITIALLY DEFERRED
You are filtering your data on status and tool_id, and sorting on updated_at but you have no single index for all three of those columns.
Add an index, like so:
CREATE INDEX ON projects_toolresult (status, tool_id, updated_at);
Related
I have a query and I have created indexes specially for this query. But I just discovered that if I use some particular values, the query stops being running fast, and does a full scan.
Here is the case of fast execution:
explain analyze SELECT
v.valtr_id,
v.block_num,
v.from_id,
v.to_id,
v.from_balance::text,
v.to_balance::text
FROM value_transfer v
WHERE
(v.block_num<=2748053) AND
(
(v.to_id=639291) OR
(v.from_id=639291)
)
ORDER BY
v.block_num DESC,v.valtr_id DESC
LIMIT 1
Limit (cost=23054.03..23054.03 rows=1 width=30) (actual time=1.464..1.465 rows=1 loops=1)
-> Sort (cost=23054.03..23068.94 rows=5964 width=30) (actual time=1.462..1.462 rows=1 loops=1)
Sort Key: block_num DESC, valtr_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on value_transfer v (cost=144.85..23024.21 rows=5964 width=30) (actual time=1.397..1.437 rows=3 loops=1)
Recheck Cond: ((to_id = 639291) OR (from_id = 639291))
Filter: (block_num <= 2748053)
Heap Blocks: exact=3
-> BitmapOr (cost=144.85..144.85 rows=5964 width=0) (actual time=1.339..1.339 rows=0 loops=1)
-> Bitmap Index Scan on vt_to_id_idx (cost=0.00..40.42 rows=1580 width=0) (actual time=0.755..0.755 rows=1 loops=1)
Index Cond: (to_id = 639291)
-> Bitmap Index Scan on vt_from_id_idx (cost=0.00..101.45 rows=4384 width=0) (actual time=0.580..0.580 rows=2 loops=1)
Index Cond: (from_id = 639291)
Planning time: 0.499 ms
Execution time: 1.556 ms
(15 rows)
But if I put the value 199658 as input to my query, it uses different search algorithm:
explain analyze SELECT
v.valtr_id,
v.block_num,
v.from_id,
v.to_id,
v.from_balance::text,
v.to_balance::text
FROM value_transfer v
WHERE
(v.block_num<=2748053) AND
(
(v.to_id=199658) OR
(v.from_id=199658)
)
ORDER BY
v.block_num DESC,v.valtr_id DESC
LIMIT 1 ;
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..6462.99 rows=1 width=30) (actual time=614109.855..614109.856 rows=1 loops=1)
-> Index Scan Backward using bnum_valtr_idx on value_transfer v (cost=0.57..200845479.66 rows=31079 width=30) (actual time=614109.853..614109.853 rows=1 loops=1)
Index Cond: (block_num <= 2748053)
Filter: ((to_id = 199658) OR (from_id = 199658))
Rows Removed by Filter: 101190609
Planning time: 0.515 ms
Execution time: 614109.920 ms
(7 rows)
Why is this happening? I thought that once you have created the indexes for your query , the execution will take the same path always, but it is not the case. How can I make sure Postgres always uses the same algorithm in every search?
I even thought this happens because probably the indexes weren't built cleanly and rebuilt the main index:
postgres=> drop index bnum_valtr_idx;
DROP INDEX
postgres=> CREATE INDEX bnum_valtr_idx ON public.value_transfer USING btree (block_num DESC, valtr_id DESC);
CREATE INDEX
postgres=>
however, this didn't change anything.
My table definitions are:
CREATE TABLE value_transfer (
valtr_id BIGSERIAL PRIMARY KEY,
tx_id BIGINT REFERENCES transaction(tx_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_id INT REFERENCES block(block_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_num INT NOT NULL,
from_id INT NOT NULL,
to_id INT NOT NULL,
value NUMERIC DEFAULT 0,
from_balance NUMERIC DEFAULT 0,
to_balance NUMERIC DEFAULT 0,
kind CHAR NOT NULL,
depth INT DEFAULT 0,
error TEXT NOT NULL
);
postgres=> SELECT * FROM pg_indexes WHERE tablename = 'value_transfer';
schemaname | tablename | indexname | tablespace | indexdef
------------+----------------+---------------------+------------+--------------------------------------------------------------------------------------------------
public | value_transfer | bnum_valtr_idx | | CREATE INDEX bnum_valtr_idx ON public.value_transfer USING btree (block_num DESC, valtr_id DESC)
public | value_transfer | value_transfer_pkey | | CREATE UNIQUE INDEX value_transfer_pkey ON public.value_transfer USING btree (valtr_id)
public | value_transfer | vt_tx_from_idx | | CREATE INDEX vt_tx_from_idx ON public.value_transfer USING btree (tx_id)
public | value_transfer | vt_block_num_idx | | CREATE INDEX vt_block_num_idx ON public.value_transfer USING btree (block_num)
public | value_transfer | vt_from_id_idx | | CREATE INDEX vt_from_id_idx ON public.value_transfer USING btree (from_id)
public | value_transfer | vt_to_id_idx | | CREATE INDEX vt_to_id_idx ON public.value_transfer USING btree (to_id)
public | value_transfer | vt_block_id_idx | | CREATE INDEX vt_block_id_idx ON public.value_transfer USING btree (block_id)
(7 rows)
postgres=>
It could could be that one value is in one column and visa versa. Regardless, an OR over different columns is notorious for causing performance problems because the query plan can only use one index, but the OR would require two indexes to be used to check both columns quickly, so one column will be checked using its index, but the other requires a scan.
The way around this problem is to break the query into a union.
Try this:
SELECT * FROM (
SELECT
valtr_id,
block_num,
from_id,
to_id,
from_balance::text,
to_balance::text
FROM value_transfer
WHERE block_num<=2748053
AND to_id=199658
UNION ALL
SELECT
valtr_id,
block_num,
from_id,
to_id,
from_balance::text,
to_balance::text
FROM value_transfer
WHERE block_num<=2748053
AND from_id=199658
) x
ORDER BY block_num DESC, valtr_id DESC
LIMIT 1
If I add an extra ORDER clause, Postgres does a Bitmap scan, and the performance drops amazingly (4 seconds vs 0.06 milliseconds). The application becomes unusable. However I am only asking it to order a small set of results, which are indexed , by the way.
How should modify my query so Postgres uses the index instead of bitmap scan? Because using the index is what it should do, I have 100 million records in the table.
Slow query, Bitmap scan:
EXPLAIN ANALYZE
SELECT
valtr_id,
from_id,
to_id,
from_balance,
to_balance,
block_num
FROM value_transfer v
WHERE v.block_num<=2435013 AND v.to_id = 22479
ORDER BY block_num desc,valtr_id desc
LIMIT 1
Limit (cost=1235402.27..1235402.27 rows=1 width=32) (actual time=4665.595..4665.596 rows=1 loops=1)
-> Sort (cost=1235402.27..1238237.41 rows=1134056 width=32) (actual time=4665.594..4665.594 rows=1 loops=1)
Sort Key: block_num DESC, valtr_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on value_transfer v (cost=21229.61..1229731.99 rows=1134056 width=32) (actual time=268.917..4170.374 rows=1102867 loops=1)
Recheck Cond: (to_id = 22479)
Rows Removed by Index Recheck: 9412580
Filter: (block_num <= 2435013)
Heap Blocks: exact=32392 lossy=132879
-> Bitmap Index Scan on vt_to_id_idx (cost=0.00..20946.10 rows=1134071 width=0) (actual time=254.870..254.870 rows=1102867 loops=1)
Index Cond: (to_id = 22479)
Planning time: 0.290 ms
Execution time: 4665.634 ms
(13 rows)
Now, if I remove just one ORDER condition, the query is orders of magnitude faster.
Without ORDER by valtr_id DESC I have the following performance:
EXPLAIN ANALYZE
SELECT
valtr_id,
from_id,
to_id,
from_balance,
to_balance,
block_num
FROM value_transfer v
WHERE v.block_num<=2435013 AND v.to_id = 22479
ORDER BY block_num desc
LIMIT 1
Limit (cost=0.57..2.46 rows=1 width=32) (actual time=0.028..0.028 rows=1 loops=1)
-> Index Scan using idx_2 on value_transfer v (cost=0.57..2148650.88 rows=1134056 width=32) (actual time=0.027..0.027 rows=1 loops=1)
Index Cond: ((to_id = 22479) AND (block_num <= 2435013))
Planning time: 0.310 ms
Execution time: 0.060 ms
(5 rows)
How do i tell Postgres to use the INDEX first, and only after that - SORT the results ?
My table is defined like this:
CREATE TABLE value_transfer (
valtr_id BIGSERIAL PRIMARY KEY,
tx_id BIGINT REFERENCES transaction(tx_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_id INT REFERENCES block(block_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_num INT NOT NULL,
from_id INT NOT NULL,
to_id INT NOT NULL,
value NUMERIC DEFAULT 0,
from_balance NUMERIC DEFAULT 0,
to_balance NUMERIC DEFAULT 0,
kind CHAR NOT NULL,
depth INT DEFAULT 0,
error TEXT NOT NULL
);
I have created lots of different indexes during my tests:
indexname | indexdef
---------------------+-----------------------------------------------------------------------------------------
value_transfer_pkey | CREATE UNIQUE INDEX value_transfer_pkey ON public.value_transfer USING btree (valtr_id)
vt_block_id_idx | CREATE INDEX vt_block_id_idx ON public.value_transfer USING btree (block_id)
vt_block_num_idx | CREATE INDEX vt_block_num_idx ON public.value_transfer USING btree (block_num)
vt_from_id_idx | CREATE INDEX vt_from_id_idx ON public.value_transfer USING btree (from_id)
vt_to_id_idx | CREATE INDEX vt_to_id_idx ON public.value_transfer USING btree (to_id)
vt_tx_from_idx | CREATE INDEX vt_tx_from_idx ON public.value_transfer USING btree (tx_id)
idx_1 | CREATE INDEX idx_1 ON public.value_transfer USING btree (from_id, block_num DESC)
idx_2 | CREATE INDEX idx_2 ON public.value_transfer USING btree (to_id, block_num DESC)
idx_1_rev | CREATE INDEX idx_1_rev ON public.value_transfer USING btree (block_num DESC, from_id)
idx_2_rev | CREATE INDEX idx_2_rev ON public.value_transfer USING btree (block_num DESC, to_id)
valtr_ordered_idx | CREATE INDEX valtr_ordered_idx ON public.value_transfer USING btree (valtr_id)
(11 rows)
I'm trying to determine if there a "low cost" optimization for the following query. We've implemented a system whereby 'tickets' earn 'points' and thus can be ranked. In order to support analytical type of queries, we store the rank of every ticket and wether the ticket is tied along with the ticket.
I've found that, at scale, storing the is_tied field is very slow. I'm attempting to run the scenario below on a set of "tickets" that is about 20-75k tickets.
I'm hoping that someone can help identify why and offer some help.
We're on postgres 9.3.6
Here's a simplified ticket table schema:
ogs_1=> \d api_ticket
Table "public.api_ticket"
Column | Type | Modifiers
------------------------------+--------------------------+---------------------------------------------------------
id | integer | not null default nextval('api_ticket_id_seq'::regclass)
status | character varying(3) | not null
points_earned | integer | not null
rank | integer | not null
event_id | integer | not null
user_id | integer | not null
is_tied | boolean | not null
Indexes:
"api_ticket_pkey" PRIMARY KEY, btree (id)
"api_ticket_4437cfac" btree (event_id)
"api_ticket_e8701ad4" btree (user_id)
"api_ticket_points_earned_idx" btree (points_earned)
"api_ticket_rank_idx" btree ("rank")
Foreign-key constraints:
"api_ticket_event_id_598c97289edc0e3e_fk_api_event_id" FOREIGN KEY (event_id) REFERENCES api_event(id) DEFERRABLE INITIALLY DEFERRED
(user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Here's the query that I'm executing:
UPDATE api_ticket t SET is_tied = False
WHERE t.event_id IN (SELECT id FROM api_event WHERE status = 'c');
UPDATE api_ticket t SET is_tied = True
FROM (
SELECT event_id, rank
FROM api_ticket tt
WHERE event_id in (SELECT id FROM api_event WHERE status = 'c')
AND tt.status <> 'x'
GROUP BY rank, event_id
HAVING count(*) > 1
) AS tied_tickets
WHERE t.rank = tied_tickets.rank AND
tied_tickets.event_id = t.event_id;
Here's the explain on a set of about 35k rows:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Update on api_ticket t (cost=3590.01..603570.21 rows=157 width=128)
-> Nested Loop (cost=3590.01..603570.21 rows=157 width=128)
-> Subquery Scan on tied_tickets (cost=2543.31..2556.18 rows=572 width=40)
-> HashAggregate (cost=2543.31..2550.46 rows=572 width=8)
Filter: (count(*) > 1)
-> Nested Loop (cost=0.84..2539.02 rows=572 width=8)
-> Index Scan using api_event_status_idx1 on api_event (cost=0.29..8.31 rows=1 width=4)
Index Cond: ((status)::text = 'c'::text)
-> Index Scan using api_ticket_4437cfac on api_ticket tt (cost=0.55..2524.99 rows=572 width=8)
Index Cond: (event_id = api_event.id)
Filter: ((status)::text <> 'x'::text)
-> Bitmap Heap Scan on api_ticket t (cost=1046.70..1050.71 rows=1 width=92)
Recheck Cond: (("rank" = tied_tickets."rank") AND (event_id = tied_tickets.event_id))
-> BitmapAnd (cost=1046.70..1046.70 rows=1 width=0)
-> Bitmap Index Scan on api_ticket_rank_idx (cost=0.00..26.65 rows=708 width=0)
Index Cond: ("rank" = tied_tickets."rank")
-> Bitmap Index Scan on api_ticket_4437cfac (cost=0.00..1019.79 rows=573 width=0)
Index Cond: (event_id = tied_tickets.event_id)
I'm running Geodjango/Postgres 9.1/PostGIS and I'm trying to get the following query (and others like it) to run faster.
[query snipped for brevity]
SELECT "crowdbreaks_incomingkeyword"."keyword_id"
, COUNT("crowdbreaks_incomingkeyword"."keyword_id") AS "cnt"
FROM "crowdbreaks_incomingkeyword"
INNER JOIN "crowdbreaks_tweet"
ON ("crowdbreaks_incomingkeyword"."tweet_id"
= "crowdbreaks_tweet"."tweet_id")
LEFT OUTER JOIN "crowdbreaks_place"
ON ("crowdbreaks_tweet"."place_id"
= "crowdbreaks_place"."place_id")
WHERE (("crowdbreaks_tweet"."coordinates"
# ST_GeomFromEWKB(E'\\001 ... \\000\\000\\000\\0008#'::bytea)
OR ST_Overlaps("crowdbreaks_place"."bounding_box"
, ST_GeomFromEWKB(E'\\001...00\\000\\0008#'::bytea)
))
AND "crowdbreaks_tweet"."created_at" > E'2012-04-17 15:46:12.109893'
AND "crowdbreaks_tweet"."created_at" < E'2012-04-18 15:46:12.109899' )
GROUP BY "crowdbreaks_incomingkeyword"."keyword_id"
, "crowdbreaks_incomingkeyword"."keyword_id"
;
Here is what the crowdbreaks_tweet table looks like:
\d+ crowdbreaks_tweet;
Table "public.crowdbreaks_tweet"
Column | Type | Modifiers | Storage | Description
---------------+--------------------------+-----------+----------+-------------
tweet_id | bigint | not null | plain |
tweeter | bigint | not null | plain |
text | text | not null | extended |
created_at | timestamp with time zone | not null | plain |
country_code | character varying(3) | | extended |
place_id | character varying(32) | | extended |
coordinates | geometry | | main |
Indexes:
"crowdbreaks_tweet_pkey" PRIMARY KEY, btree (tweet_id)
"crowdbreaks_tweet_coordinates_id" gist (coordinates)
"crowdbreaks_tweet_created_at" btree (created_at)
"crowdbreaks_tweet_place_id" btree (place_id)
"crowdbreaks_tweet_place_id_like" btree (place_id varchar_pattern_ops)
Check constraints:
"enforce_dims_coordinates" CHECK (st_ndims(coordinates) = 2)
"enforce_geotype_coordinates" CHECK (geometrytype(coordinates) = 'POINT'::text OR coordinates IS NULL)
"enforce_srid_coordinates" CHECK (st_srid(coordinates) = 4326)
Foreign-key constraints:
"crowdbreaks_tweet_place_id_fkey" FOREIGN KEY (place_id) REFERENCES crowdbreaks_place(place_id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "crowdbreaks_incomingkeyword" CONSTRAINT "crowdbreaks_incomingkeyword_tweet_id_fkey" FOREIGN KEY (tweet_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED
TABLE "crowdbreaks_tweetanswer" CONSTRAINT "crowdbreaks_tweetanswer_tweet_id_id_fkey" FOREIGN KEY (tweet_id_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no
And here is the explain analyze for the query:
HashAggregate (cost=184022.03..184023.18 rows=115 width=4) (actual time=6381.707..6381.769 rows=62 loops=1)
-> Hash Join (cost=103857.48..183600.24 rows=84357 width=4) (actual time=1745.449..6377.505 rows=3453 loops=1)
Hash Cond: (crowdbreaks_incomingkeyword.tweet_id = crowdbreaks_tweet.tweet_id)
-> Seq Scan on crowdbreaks_incomingkeyword (cost=0.00..36873.97 rows=2252597 width=12) (actual time=0.008..2136.839 rows=2252597 loops=1)
-> Hash (cost=102535.68..102535.68 rows=80544 width=8) (actual time=1744.815..1744.815 rows=3091 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 32kB
-> Hash Left Join (cost=16574.93..102535.68 rows=80544 width=8) (actual time=112.551..1740.651 rows=3091 loops=1)
Hash Cond: ((crowdbreaks_tweet.place_id)::text = (crowdbreaks_place.place_id)::text)
Filter: ((crowdbreaks_tweet.coordinates # '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) OR ((crowdbreaks_place.bounding_box && '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) AND _st_overlaps(crowdbreaks_place.bounding_box, '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry)))
-> Bitmap Heap Scan on crowdbreaks_tweet (cost=15874.18..67060.28 rows=747873 width=125) (actual time=96.012..940.462 rows=736784 loops=1)
Recheck Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone))
-> Bitmap Index Scan on crowdbreaks_tweet_crreated_at (cost=0.00..15687.22 rows=747873 width=0) (actual time=94.259..94.259 rows=736784 loops=1)
Index Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone))
-> Hash (cost=217.11..217.11 rows=6611 width=469) (actual time=15.926..15.926 rows=6611 loops=1)
Buckets: 1024 Batches: 4 Memory Usage: 259kB
-> Seq Scan on crowdbreaks_place (cost=0.00..217.11 rows=6611 width=469) (actual time=0.005..6.908 rows=6611 loops=1)
Total runtime: 6381.903 ms
(17 rows)
That's a pretty bad runtime for the query. Ideally, I'd like to get results back in a second or two.
I've increased shared_buffers on Postgres to 2GB (I have 8GB of RAM) but other than that I'm not quite sure what to do. What are my options? Should I do fewer joins? Are there any other indexes I can throw on there? The sequential scan on crowdbreaks_incomingkeyword doesn't make sense to me. It's a table of foreign keys to other tables, and thus has indexes on it.
Judging from your comment I would try two things:
Raise statistics target for involved columns (and run ANALYZE).
ALTER TABLE tbl ALTER COLUMN column SET STATISTICS 1000;
The data distribution may be uneven. A bigger sample may provide the query planner with more accurate estimates.
Play with the cost settings in postgresql.conf. Your sequential scans might need to be more expensive compared to your index scans to give good estimates.
Try to lower the cost for cpu_index_tuple_cost and set effective_cache_size to something as high as three quaters of your total RAM for a dedicated DB server.
Our application has a very slow statement, it takes more than 11 second, so I want to know is there any way to optimize it ?
The SQL statement
SELECT id FROM mapfriends.cell_forum_topic WHERE id in (
SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid )
AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
id
---------
2471959
2382296
1535967
2432006
2367281
2159706
1501759
1549304
2179763
1598043
(10 rows)
Time: 11444.976 ms
Plan
friends=> explain SELECT id FROM friends.cell_forum_topic WHERE id in (
friends(> SELECT topicid FROM friends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid)
friends-> AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1443.15..1443.15 rows=2 width=12)
-> Sort (cost=1443.15..1443.15 rows=2 width=12)
Sort Key: cell_forum_topic.restoretime
-> Nested Loop (cost=1434.28..1443.14 rows=2 width=12)
-> HashAggregate (cost=1434.28..1434.30 rows=2 width=4)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1430.49 rows=1516 width=4)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
(10 rows)
Time: 1.109 ms
Indexes
friends=> \d cell_forum_item
Table "friends.cell_forum_item"
Column | Type | Modifiers
---------+--------------------------------+--------------------------------------------------------------
id | integer | not null default nextval('cell_forum_item_id_seq'::regclass)
topicid | integer | not null
skyid | integer | not null
content | character varying(200) |
addtime | timestamp(0) without time zone | default now()
ischeck | boolean |
Indexes:
"cell_forum_item_pkey" PRIMARY KEY, btree (id)
"cell_forum_item_idx" btree (topicid, skyid)
"cell_forum_item_idx_1" btree (topicid, id)
"cell_forum_item_idx_skyid" btree (skyid)
friends=> \d cell_forum_topic
Table "friends.cell_forum_topic"
Column | Type | Modifiers
-------------+--------------------------------+-------------------------------------------------------------------------------------
-
id | integer | not null default nextval(('"friends"."cell_forum_topic_id_seq"'::text)::regclass)
categoryid | integer | not null
topic | character varying | not null
content | character varying | not null
skyid | integer | not null
addtime | timestamp(0) without time zone | default now()
reference | integer | default 0
restore | integer | default 0
restoretime | timestamp(0) without time zone | default now()
locked | boolean | default false
settop | boolean | default false
hidden | boolean | default false
feature | boolean | default false
picid | integer | default 29249
managerid | integer |
imageid | integer | default 0
pass | boolean | default false
ischeck | boolean |
Indexes:
"cell_forum_topic_pkey" PRIMARY KEY, btree (id)
"idx_cell_forum_topic_1" btree (categoryid, settop, hidden, restoretime, skyid)
"idx_cell_forum_topic_2" btree (categoryid, hidden, restoretime, skyid)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
"idx_cell_forum_topic_4" btree (categoryid, hidden, restore)
"idx_cell_forum_topic_5" btree (categoryid, hidden, restoretime, feature)
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
Explain analyze
mapfriends=> explain analyze SELECT id FROM mapfriends.cell_forum_topic
mapfriends-> join (SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid) as tmp
mapfriends-> on mapfriends.cell_forum_topic.id=tmp.topicid
mapfriends-> where categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.006..18016.013 rows=10 loops=1)
-> Sort (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.001..18016.002 rows=10 loops=1)
Sort Key: cell_forum_topic.restoretime
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1438.02..1446.88 rows=2 width=12) (actual time=16988.492..18015.869 rows=20 loops=1)
-> HashAggregate (cost=1438.02..1438.04 rows=2 width=4) (actual time=15446.735..15447.243 rows=610 loops=1)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1434.22 rows=1520 width=4) (actual time=302.378..15429.782 rows=7133 loops=1)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12) (actual time=4.210..4.210 rows=0 loops=610)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
Total runtime: 18019.461 ms
Could you give us some more information about the tables (the statistics) and the configuration?
SELECT version();
SELECT category, name, setting FROM pg_settings WHERE name IN('effective_cache_size', 'enable_seqscan', 'shared_buffers');
SELECT * FROM pg_stat_user_tables WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stat_user_indexes WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stats WHERE tablename IN('cell_forum_topic', 'cell_forum_item');
And before getting this data, use ANALYZE.
It looks like you have a problem with an index, this is where all the query spends all it's time:
-> Index Scan using cell_forum_item_idx_skyid on
cell_forum_item (cost=0.00..1434.22
rows=1520 width=4) (actual
time=302.378..15429.782 rows=7133
loops=1)
If you use VACUUM FULL on a regular basis (NOT RECOMMENDED!), index bloat might be your problem. A REINDEX might be a good idea, just to be sure:
REINDEX TABLE cell_forum_item;
And talking about indexes, you can drop a couple of them, these are obsolete:
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
Other indexes have the same data and can be used by the database as well.
It looks like you have a couple of problems:
autovacuum is turned off or it's way
behind. That last autovacuum was on
2010-12-02 and you have 256734 dead
tuples in one table and 451430 dead
ones in the other.... You have to do
something about this, this is a
serious problem.
When autovacuum is working again, you
have to do a VACUUM FULL and a
REINDEX to force a table rewrite and
get rid of all empty space in your
tables.
after fixing the vacuum-problem, you
have to analyze as well: the database
expects 1520 results but it gets 7133
results. This could be a problem with
statistics, maybe you have to
increase the STATISTICS.
The query itself needs some rewriting
as well: It gets 7133 results but it
needs only 610 results. Over 90% of
the results are lost... And getting
these 7133 takes a lot of time, over
15 seconds. Get rid of the subquery by using a JOIN without the GROUP BY or use EXISTS, also without the GROUP BY.
But first get autovacuum back on track, before you get new or other problems.
the problem isn't due to lack of caching of the query plan but most likely due to the choice of plan due to lack of appropriate indexes