Speed up this query for creating a materialized view? - postgresql

I'm using Postgres 9.4 on OSX with 16GB of RAM. I am creating a materialized view on my Postgres tables, but it is tremendously slow to create and I'm wondering if there is any way to speed things up.
This are the underlying tables, showing spending items and organisations:
Table "public.spending_item"
Column | Type | Modifiers
-------------------+-------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('spending_item_id_seq'::regclass)
presentation_code | character varying(15) | not null
presentation_name | character varying(1000) | not null
total_items | integer | not null
actual_cost | double precision | not null
processing_date | date | not null
organisation_id | character varying(9) | not null
Table "public.organisation"
Column | Type | Modifiers
-----------+------------------------+-----------
code | character varying(9) | not null
name | character varying(200) | not null
There are 100m rows in the spending_item table, and around 10,000 in the organisation table.
The materialized view I want to create is as follows - basically it's total spending and total items per month, grouped by by presentation code and organisation.
(I'm creating it because I need queries like "for organisations with codes like 0304%, give me total spending by month and by organisation", or "for the organisation with code 030405, give me total spending by month and by presentation_code". My data only changes once a month, so I think a materialized view makes sense for this.)
Materialized view "public.vw_spending_summary"
Column | Type | Modifiers
-------------------+-------------------------+-----------
processing_date | date |
organsiation_id | character varying(9) |
organisation_name | character varying(200) |
total_cost | double precision |
total_items | double precision |
presentation_name | character varying(1000) |
presentation_code | character varying(15) |
The query I'm using to create the materialized view is as follows:
SELECT spending_item.processing_date,
spending_item.organisation_id,
organisation.name,
spending_item.presentation_code,
spending_item.presentation_name,
SUM(spending_item.total_items) AS items,
SUM(spending_item.actual_cost) AS cost
FROM organisation, spending_item
WHERE spending_item.organisation_id=organisation.code
GROUP BY spending_item.processing_date, spending_item.presentation_name,
spending_item.presentation_code, spending_item.organisation_id,
organisation.name;
The query has been running for many hours and has not completed yet.
This is the result of an EXPLAIN ANALYZE on a more restrictive query (as above, but with WHERE spending_item.organisation_id like '0101%' added):
GroupAggregate (cost=2263051.70..2291072.45 rows=934025 width=86) (actual time=60890.505..61069.808 rows=180 loops=1)
-> Sort (cost=2263051.70..2265386.76 rows=934025 width=86) (actual time=60890.497..60945.558 rows=397717 loops=1)
Sort Key: spending_item.processing_date, spending_item.presentation_name, spending_item.presentation_code, spending_item.organisation_id, organisation.name
Sort Method: external sort Disk: 32240kB
-> Hash Join (cost=23877.71..2125733.64 rows=934025 width=86) (actual time=180.302..42251.826 rows=397717 loops=1)
Hash Cond: ((spending_item.organisation_id)::text = (organisation.code)::text)
-> Bitmap Heap Scan on spending_item (cost=23276.92..2109954.94 rows=934025 width=68) (actual time=178.521..41743.141 rows=397717 loops=1)
Filter: ((organisation_id)::text ~~ '0201%'::text)
-> Bitmap Index Scan on spending_item_organisation_id_varchar_pattern_ops_idx (cost=0.00..23043.41 rows=924684 width=0) (actual time=136.077..136.077 rows=397717 loops=1)
Index Cond: (((organisation_id)::text ~>=~ '0201'::text) AND ((organisation_id)::text ~<~ '0202'::text))
-> Hash (cost=558.13..558.13 rows=3413 width=27) (actual time=1.774..1.774 rows=3413 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 203kB
-> Seq Scan on organisation (cost=0.00..558.13 rows=3413 width=27) (actual time=0.004..0.991 rows=3413 loops=1)
Total runtime: 61089.664 ms
I'd be grateful for any suggestions on speeding this up. I've already set maintenance_work_mem, shared_buffers etc high, so I don't think there's much I can do on the settings front.
Or perhaps it would make more sense to have two different materialized views, for the two types of queries I want to run?
Please let me know if there is any other information I can provide.

Related

Query optimization on timestamp and group by in PostgreSQL

I want to query for my table with the following structure:
Table "public.company_geo_table"
Column | Type | Collation | Nullable | Default
--------------------+--------+-----------+----------+---------
geoname_id | bigint | | |
date | text | | |
cik | text | | |
count | bigint | | |
country_iso_code | text | | |
subdivision_1_name | text | | |
city_name | text | | |
Indexes:
"cik_country_index" btree (cik, country_iso_code)
"cik_geoname_index" btree (cik, geoname_id)
"cik_index" btree (cik)
"date_index" brin (date)
I tried with the following sql query, which need to query for a specific cik number during a time perid, and group by the cik with geoname_id(different areas).
select cik, geoname_id, sum(count) as total
from company_geo_table
where cik = '1111111'
and date between '2016-01-01' and '2016-01-10'
group by cik, geoname_id
The explanation result showed that they only use the cik index and date index, and did not use the cik_geoname index. Why? Is there any way I can optimize my solution? Any new indices? Thank you in advance.
HashAggregate (cost=117182.79..117521.42 rows=27091 width=47) (actual time=560132.903..560134.229 rows=3552 loops=1)
Group Key: cik, geoname_id
-> Bitmap Heap Scan on company_geo_table (cost=16467.77..116979.48 rows=27108 width=23) (actual time=6486.232..560114.828 rows=8175 loops=1)
Recheck Cond: ((date >= '2016-01-01'::text) AND (date <= '2016-01-10'::text) AND (cik = '1288776'::text))
Rows Removed by Index Recheck: 16621155
Heap Blocks: lossy=193098
-> BitmapAnd (cost=16467.77..16467.77 rows=27428 width=0) (actual time=6469.640..6469.641 rows=0 loops=1)
-> Bitmap Index Scan on date_index (cost=0.00..244.81 rows=7155101 width=0) (actual time=53.034..53.035 rows=8261120 loops=1)
Index Cond: ((date >= '2016-01-01'::text) AND (date <= '2016-01-10'::text))
-> Bitmap Index Scan on cik_index (cost=0.00..16209.15 rows=739278 width=0) (actual time=6370.930..6370.930 rows=676231 loops=1)
Index Cond: (cik = '1111111'::text)
Planning time: 12.909 ms
Execution time: 560135.432 ms
There is not good estimation (and probably the value '1111111' is used too often (I am not sure about impact, but looks so cik column has wrong data type (text), what can be a reason (or partial reason) of not good estimation.
Bitmap Heap Scan on company_geo_table (cost=16467.77..116979.48 rows=27108 width=23) (actual time=6486.232..560114.828 rows=8175 loops=1)
Looks like composite index (date, cik) can helps
Your problem seems to be here:
Rows Removed by Index Recheck: 16621155
Heap Blocks: lossy=193098
Your work_mem setting is too low, so PostgreSQL cannot fit a bitmap that contains one bit per table row, so it degrades to one bit per 8K block. This means that many false positive hits have to be removed during that bitmap heap scan.
Try with higher work_mem and see if that improves query performance.
The ideal index would be
CREATE INDEX ON company_geo_table (cik, date);

Why is a MAX query with an equality filter on one other column so slow in Postgresql?

I'm running into an issue in PostgreSQL (version 9.6.10) with indexes not working to speed up a MAX query with a simple equality filter on another column. Logically it seems that a simple multicolumn index on (A, B DESC) should make the query super fast.
I can't for the life of me figure out why I can't get a query to be performant regardless of what indexes are defined.
The table definition has the following:
- A primary key foo VARCHAR PRIMARY KEY (not used in the query)
- A UUID field that is NOT NULL called bar UUID
- A sequential_id column that was created as a BIGSERIAL UNIQUE type
Here's what the relevant columns look like exactly (with names modified for privacy):
Table "public.foo"
Column | Type | Modifiers
----------------------+--------------------------+--------------------------------------------------------------------------------
foo_uid | character varying | not null
bar_uid | uuid | not null
sequential_id | bigint | not null default nextval('foo_sequential_id_seq'::regclass)
Indexes:
"foo_pkey" PRIMARY KEY, btree (foo_uid)
"foo_bar_uid_sequential_id_idx", btree (bar_uid, sequential_id DESC)
"foo_sequential_id_key" UNIQUE CONSTRAINT, btree (sequential_id)
Despite having the index listed above on (bar_uid, sequential_id DESC), the following query requires an index scan and takes 100-300ms with a few million rows in the database.
The Query (get the max sequential_id for a given bar_uid):
SELECT MAX(sequential_id)
FROM foo
WHERE bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f';
The EXPLAIN ANALYZE result doesn't use the proper index. Also, for some reason it checks if sequential_id IS NOT NULL even though it's declared as not null.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.75..0.76 rows=1 width=8) (actual time=321.110..321.110 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..0.75 rows=1 width=8) (actual time=321.106..321.106 rows=1 loops=1)
-> Index Scan Backward using foo_sequential_id_key on foo (cost=0.43..98936.43 rows=308401 width=8) (actual time=321.106..321.106 rows=1 loops=1)
Index Cond: (sequential_id IS NOT NULL)
Filter: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Rows Removed by Filter: 920761
Planning time: 0.196 ms
Execution time: 321.127 ms
(9 rows)
I can add a seemingly unnecessary GROUP BY to this query, and that speeds it up a bit, but it's still really slow for a query that should be near instantaneous with indexes defined:
SELECT MAX(sequential_id)
FROM foo
WHERE bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'
GROUP BY bar_uid;
The EXPLAIN (ANALYZE, BUFFERS) result:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=8510.54..65953.61 rows=6 width=24) (actual time=234.529..234.530 rows=1 loops=1)
Group Key: bar_uid
Buffers: shared hit=1 read=11909
-> Bitmap Heap Scan on foo (cost=8510.54..64411.55 rows=308401 width=24) (actual time=65.259..201.969 rows=309023 loops=1)
Recheck Cond: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Heap Blocks: exact=10385
Buffers: shared hit=1 read=11909
-> Bitmap Index Scan on foo_bar_uid_sequential_id_idx (cost=0.00..8433.43 rows=308401 width=0) (actual time=63.549..63.549 rows=309023 loops=1)
Index Cond: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Buffers: shared read=1525
Planning time: 3.067 ms
Execution time: 234.589 ms
(12 rows)
Does anyone have any idea what's blocking this query from being on the order of 10 milliseconds? This should logically be instantaneous with the right index defined. It should only require the time to follow links to the leaf value in the B-Tree.
Someone asked:
What do you get for SELECT * FROM pg_stats WHERE tablename = 'foo' and attname = 'bar_uid';?
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+------------------------+-------------+-----------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------+------------------+-------------+-------------------+------------------------+----------------------
public | foo | bar_uir | f | 0 | 16 | 6 | {fa61424d-389f-4e75-ba2d-b77e6bb8491f,5c5dcae9-1b7e-4413-99a1-62fde2b89c32,50b1e842-fc32-4c2c-b00f-4a17c3c1c5fa,7ff1999c-c0ea-b700-343f-9a737f6ad659,f667b353-e199-4890-9ffd-4940ea11fe2c,b24ce968-29fd-4587-ba1f-227036ee3135} | {0.203733,0.203167,0.201567,0.195867,0.1952,0.000466667} | | -0.158093 | | |
(1 row)

Efficient DB solution for system task tracking

I'm currently working on a data tracking system. The system is a multiprocess application written in Python and working in the following manner:
every S seconds it selects the N most appropriate tasks from the
database (currently Postgres) and finds data for it
if there's no tasks, it creates N new tasks and returns to (1).
The problem is following - currently I have approx. 80GB of data and 36M of tasks and the queries to the tasks table begin to work slower and slower (its the most populated and the most frequently used table).
The main bottleneck of performance
is the task tracking query:
LOCK TABLE task IN ACCESS EXCLUSIVE MODE;
SELECT * FROM task WHERE line = 1 AND action = ANY(ARRAY['Find', 'Get']) AND (stat IN ('', 'CR1') OR stat = 'ERROR' AND (actiondate <= NOW() OR actiondate IS NULL)) ORDER BY taskid, actiondate, action DESC, idtype, date ASC LIMIT 36;
Table "public.task"
Column | Type | Modifiers
------------+-----------------------------+-------------------------------------------------
number | character varying(16) | not null
date | timestamp without time zone | default now()
stat | character varying(16) | not null default ''::character varying
idtype | character varying(16) | not null default 'container'::character varying
uri | character varying(1024) |
action | character varying(16) | not null default 'Find'::character varying
reason | character varying(4096) | not null default ''::character varying
rev | integer | not null default 0
actiondate | timestamp without time zone |
modifydate | timestamp without time zone |
line | integer |
datasource | character varying(512) |
taskid | character varying(32) |
found | integer | not null default 0
Indexes:
"task_pkey" PRIMARY KEY, btree (idtype, number)
"action_index" btree (action)
"actiondate_index" btree (actiondate)
"date_index" btree (date)
"line_index" btree (line)
"modifydate_index" btree (modifydate)
"stat_index" btree (stat)
"taskid_index" btree (taskid)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=312638.87..312638.96 rows=36 width=668) (actual time=1838.193..1838.197 rows=36 loops=1)
-> Sort (cost=312638.87..313149.54 rows=204267 width=668) (actual time=1838.192..1838.194 rows=36 loops=1)
Sort Key: taskid, actiondate, action, idtype, date
Sort Method: top-N heapsort Memory: 43kB
-> Bitmap Heap Scan on task (cost=107497.61..306337.31 rows=204267 width=668) (actual time=1013.491..1343.751 rows=914586 loops=1)
Recheck Cond: ((((stat)::text = ANY ('{"",CR1}'::text[])) OR ((stat)::text = 'ERROR'::text)) AND (line = 1))
Filter: (((action)::text = ANY ('{Find,Get}'::text[])) AND (((stat)::text = ANY ('{"",CR1}'::text[])) OR (((stat)::text = 'ERROR'::text) AND ((actiondate <= now()) OR (actiondate IS NULL)))))
Rows Removed by Filter: 133
Heap Blocks: exact=76064
-> BitmapAnd (cost=107497.61..107497.61 rows=237348 width=0) (actual time=999.457..999.457 rows=0 loops=1)
-> BitmapOr (cost=9949.15..9949.15 rows=964044 width=0) (actual time=121.936..121.936 rows=0 loops=1)
-> Bitmap Index Scan on stat_index (cost=0.00..9449.46 rows=925379 width=0) (actual time=117.791..117.791 rows=920900 loops=1)
Index Cond: ((stat)::text = ANY ('{"",CR1}'::text[]))
-> Bitmap Index Scan on stat_index (cost=0.00..397.55 rows=38665 width=0) (actual time=4.144..4.144 rows=30262 loops=1)
Index Cond: ((stat)::text = 'ERROR'::text)
-> Bitmap Index Scan on line_index (cost=0.00..97497.14 rows=9519277 width=0) (actual time=853.033..853.033 rows=9605462 loops=1)
Index Cond: (line = 1)
Planning time: 0.284 ms
Execution time: 1838.882 ms
(19 rows)
Of course, all involved fields are indexed. I'm currently thinking in two directions:
how to optimize the query and will it actually give me a performance improvement for perspective or not (currently it takes approx. 10 seconds per query which is inacceptable in dynamic task tracking)
where and how it would be more effective to store the task data - may be I should use another DB for such purposes - Cassandra, VoltDB or another Big Data store?
I think that the data should be somehow preordered to get actual tasks as fast as possible.
And also please keep in mind that my current volume of 80G is most likely a minimum rather than maximum for a such task.
Thanks in advance!
I don't quite understand your use case, but it doesn't look to me like your indexes are working too well. It looks like the query is relying mostly on the stat index. I think you need to look into a composite index something like (action, line, stat).
Another option is to shard your data across multiple tables splitting it on some key with low cardinality. I don't use postgres but I don't think looking at another db solution is going to work better unless you know exactly what you're optimizing for.

PostgreSQL Full Text Search: why search is sooo slow?

I have a small PostgreSQL database (~~3,000 rows).
I'm trying to set up a full text search on one of it's text fields ('body').
The problem is that any query is extremely slow (35+ seconds!!!).
I suppose the problem comes from the fact that the DB chooses a sequential scan mode...
This is my query:
SELECT
ts_rank_cd(to_tsvector('italian', body), query),
ts_headline('italian', body, to_tsquery('torino')),
title,
location,
id_author
FROM
fulltextsearch.documents, to_tsquery('torino') as query
WHERE
(body_tsvector ## query)
OFFSET
0
This is the EXPLAIN ANALYZE:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1129.81 rows=19 width=468) (actual time=74.059..13630.114 rows=863 loops=1)
-> Nested Loop (cost=0.00..1129.81 rows=19 width=468) (actual time=74.056..13629.342 rows=863 loops=1)
Join Filter: (documents.body_tsvector ## query.query)
-> Function Scan on to_tsquery query (cost=0.00..0.01 rows=1 width=32) (actual time=4.606..4.608 rows=1 loops=1)
-> Seq Scan on documents (cost=0.00..1082.09 rows=3809 width=591) (actual time=0.045..48.072 rows=3809 loops=1)
Total runtime: 13630.720 ms
This is my table:
mydb=# \d+ fulltextsearch.documents;
Table "fulltextsearch.documents"
Column | Type | Modifiers | Storage | Description
---------------+-------------------+-----------------------------------------------------------------------+----------+-------------
id | integer | not null default nextval('fulltextsearch.documents_id_seq'::regclass) | plain |
id_author | integer | | plain |
body | character varying | | extended |
title | character varying | | extended |
location | character varying | | extended |
date_creation | date | | plain |
body_tsvector | tsvector | | extended |
Indexes:
"fulltextsearch_documents_tsvector_idx" gin (to_tsvector('italian'::regconfig, COALESCE(body, ''::character varying)::text))
"id_idx" btree (id)
Triggers:
body_tsvectorupdate BEFORE INSERT OR UPDATE ON fulltextsearch.documents FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('body_tsvector', 'pg_catalog.italian', 'body')
Has OIDs: no
I'm sure I'm missing something obvious....
Any clues?
.
.
.
=== UPDATE =======================================================================
Thanks to your suggestions, I came up with this (better) query:
SELECT
ts_rank(body_tsvector, query),
ts_headline('italian', body, query),
title,
location
FROM
fulltextsearch.documents, to_tsquery('italian', 'torino') as query
WHERE
to_tsvector('italian', coalesce(body,'')) ## query
which is quite better, but always very slow (13+ seconds...).
I notice that commenting out the "ts_headline()" row the query is lightning-fast.
This is the EXPLAIN ANALYZE, which finally uses the index, but doesn't help me much...:
EXPLAIN ANALYZE SELECT
clock_timestamp() - statement_timestamp() as elapsed_time,
ts_rank(body_tsvector, query),
ts_headline('italian', body, query),
title,
location
FROM
fulltextsearch.documents, to_tsquery('italian', 'torino') as query
WHERE
to_tsvector('italian', coalesce(body,'')) ## query
Nested Loop (cost=16.15..85.04 rows=19 width=605) (actual time=102.290..13392.161 rows=863 loops=1)
-> Function Scan on query (cost=0.00..0.01 rows=1 width=32) (actual time=0.008..0.009 rows=1 loops=1)
-> Bitmap Heap Scan on documents (cost=16.15..84.65 rows=19 width=573) (actual time=0.381..4.236 rows=863 loops=1)
Recheck Cond: (to_tsvector('italian'::regconfig, (COALESCE(body, ''::character varying))::text) ## query.query)
-> Bitmap Index Scan on fulltextsearch_documents_tsvector_idx (cost=0.00..16.15 rows=19 width=0) (actual time=0.312..0.312 rows=863 loops=1)
Index Cond: (to_tsvector('italian'::regconfig, (COALESCE(body, ''::character varying))::text) ## query.query)
Total runtime: 13392.717 ms
You're missing two (reasonably obvious) things:
1 You've set 'italian' in your to_tsvector() but you aren't specifying it in to_tsquery()
Keep both consistent.
2 You've indexed COALESCE(body, ...) but that isn't what you're searching against.
The planner isn't magic - you can only use an index if that's what you're searching against.
At last, with the help of your answers and comments, and with some googling, I did solve by running ts_headline() (a very heavy function, I suppose) on a subset of the full result set (the results page I'm interested in):
SELECT
id,
ts_headline('italian', body, to_tsquery('italian', 'torino')) as headline,
rank,
title,
location
FROM (
SELECT
id,
body,
title,
location,
ts_rank(body_tsvector, query) as rank
FROM
fulltextsearch.documents, to_tsquery('italian', 'torino') as query
WHERE
to_tsvector('italian', coalesce(body,'')) ## query
LIMIT 10
OFFSET 0
) as s
I solved the problem by precomputing the ts_rank_cd and storing it in a table for popular terms(high occurrences) in the corpus. The search looks at this table to get the sorted doc rank for a query term. if not there (for less popular terms), it will default to creating the ts_rank_cd on the fly.
Please take a look at this post.
https://dba.stackexchange.com/a/149701

A slow sql statments , is there any way to optmize it?

Our application has a very slow statement, it takes more than 11 second, so I want to know is there any way to optimize it ?
The SQL statement
SELECT id FROM mapfriends.cell_forum_topic WHERE id in (
SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid )
AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
id
---------
2471959
2382296
1535967
2432006
2367281
2159706
1501759
1549304
2179763
1598043
(10 rows)
Time: 11444.976 ms
Plan
friends=> explain SELECT id FROM friends.cell_forum_topic WHERE id in (
friends(> SELECT topicid FROM friends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid)
friends-> AND categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1443.15..1443.15 rows=2 width=12)
-> Sort (cost=1443.15..1443.15 rows=2 width=12)
Sort Key: cell_forum_topic.restoretime
-> Nested Loop (cost=1434.28..1443.14 rows=2 width=12)
-> HashAggregate (cost=1434.28..1434.30 rows=2 width=4)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1430.49 rows=1516 width=4)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
(10 rows)
Time: 1.109 ms
Indexes
friends=> \d cell_forum_item
Table "friends.cell_forum_item"
Column | Type | Modifiers
---------+--------------------------------+--------------------------------------------------------------
id | integer | not null default nextval('cell_forum_item_id_seq'::regclass)
topicid | integer | not null
skyid | integer | not null
content | character varying(200) |
addtime | timestamp(0) without time zone | default now()
ischeck | boolean |
Indexes:
"cell_forum_item_pkey" PRIMARY KEY, btree (id)
"cell_forum_item_idx" btree (topicid, skyid)
"cell_forum_item_idx_1" btree (topicid, id)
"cell_forum_item_idx_skyid" btree (skyid)
friends=> \d cell_forum_topic
Table "friends.cell_forum_topic"
Column | Type | Modifiers
-------------+--------------------------------+-------------------------------------------------------------------------------------
-
id | integer | not null default nextval(('"friends"."cell_forum_topic_id_seq"'::text)::regclass)
categoryid | integer | not null
topic | character varying | not null
content | character varying | not null
skyid | integer | not null
addtime | timestamp(0) without time zone | default now()
reference | integer | default 0
restore | integer | default 0
restoretime | timestamp(0) without time zone | default now()
locked | boolean | default false
settop | boolean | default false
hidden | boolean | default false
feature | boolean | default false
picid | integer | default 29249
managerid | integer |
imageid | integer | default 0
pass | boolean | default false
ischeck | boolean |
Indexes:
"cell_forum_topic_pkey" PRIMARY KEY, btree (id)
"idx_cell_forum_topic_1" btree (categoryid, settop, hidden, restoretime, skyid)
"idx_cell_forum_topic_2" btree (categoryid, hidden, restoretime, skyid)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
"idx_cell_forum_topic_4" btree (categoryid, hidden, restore)
"idx_cell_forum_topic_5" btree (categoryid, hidden, restoretime, feature)
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
Explain analyze
mapfriends=> explain analyze SELECT id FROM mapfriends.cell_forum_topic
mapfriends-> join (SELECT topicid FROM mapfriends.cell_forum_item WHERE skyid=103230293 GROUP BY topicid) as tmp
mapfriends-> on mapfriends.cell_forum_topic.id=tmp.topicid
mapfriends-> where categoryid=29 AND hidden=false ORDER BY restoretime DESC LIMIT 10 OFFSET 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.006..18016.013 rows=10 loops=1)
-> Sort (cost=1446.89..1446.90 rows=2 width=12) (actual time=18016.001..18016.002 rows=10 loops=1)
Sort Key: cell_forum_topic.restoretime
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1438.02..1446.88 rows=2 width=12) (actual time=16988.492..18015.869 rows=20 loops=1)
-> HashAggregate (cost=1438.02..1438.04 rows=2 width=4) (actual time=15446.735..15447.243 rows=610 loops=1)
-> Index Scan using cell_forum_item_idx_skyid on cell_forum_item (cost=0.00..1434.22 rows=1520 width=4) (actual time=302.378..15429.782 rows=7133 loops=1)
Index Cond: (skyid = 103230293)
-> Index Scan using cell_forum_topic_pkey on cell_forum_topic (cost=0.00..4.40 rows=1 width=12) (actual time=4.210..4.210 rows=0 loops=610)
Index Cond: (cell_forum_topic.id = cell_forum_item.topicid)
Filter: ((NOT cell_forum_topic.hidden) AND (cell_forum_topic.categoryid = 29))
Total runtime: 18019.461 ms
Could you give us some more information about the tables (the statistics) and the configuration?
SELECT version();
SELECT category, name, setting FROM pg_settings WHERE name IN('effective_cache_size', 'enable_seqscan', 'shared_buffers');
SELECT * FROM pg_stat_user_tables WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stat_user_indexes WHERE relname IN('cell_forum_topic', 'cell_forum_item');
SELECT * FROM pg_stats WHERE tablename IN('cell_forum_topic', 'cell_forum_item');
And before getting this data, use ANALYZE.
It looks like you have a problem with an index, this is where all the query spends all it's time:
-> Index Scan using cell_forum_item_idx_skyid on
cell_forum_item (cost=0.00..1434.22
rows=1520 width=4) (actual
time=302.378..15429.782 rows=7133
loops=1)
If you use VACUUM FULL on a regular basis (NOT RECOMMENDED!), index bloat might be your problem. A REINDEX might be a good idea, just to be sure:
REINDEX TABLE cell_forum_item;
And talking about indexes, you can drop a couple of them, these are obsolete:
"idx_cell_forum_topic_6" btree (categoryid, settop, hidden, restoretime)
"idx_cell_forum_topic_3" btree (categoryid, hidden, restoretime)
Other indexes have the same data and can be used by the database as well.
It looks like you have a couple of problems:
autovacuum is turned off or it's way
behind. That last autovacuum was on
2010-12-02 and you have 256734 dead
tuples in one table and 451430 dead
ones in the other.... You have to do
something about this, this is a
serious problem.
When autovacuum is working again, you
have to do a VACUUM FULL and a
REINDEX to force a table rewrite and
get rid of all empty space in your
tables.
after fixing the vacuum-problem, you
have to analyze as well: the database
expects 1520 results but it gets 7133
results. This could be a problem with
statistics, maybe you have to
increase the STATISTICS.
The query itself needs some rewriting
as well: It gets 7133 results but it
needs only 610 results. Over 90% of
the results are lost... And getting
these 7133 takes a lot of time, over
15 seconds. Get rid of the subquery by using a JOIN without the GROUP BY or use EXISTS, also without the GROUP BY.
But first get autovacuum back on track, before you get new or other problems.
the problem isn't due to lack of caching of the query plan but most likely due to the choice of plan due to lack of appropriate indexes