Not Sure if Postgresql Cube Gist Index is working - postgresql

I'm trying to figure out if my GIST index on my cube column for my table is working for my nearest neighbors query (metric = Euclidean). My cube values are 75 dimensional vectors.
Table:
\d+ reduced_features
Table "public.reduced_features"
Column | Type | Modifiers | Storage | Stats target | Description
----------+--------+-----------+---------+--------------+-------------
id | bigint | not null | plain | |
features | cube | not null | plain | |
Indexes:
"reduced_features_id_idx" UNIQUE, btree (id)
"reduced_features_features_idx" gist (features)
Here is my query:
explain analyze select id from reduced_features order by features <-> (select features from reduced_features where id = 198990) limit 10;
Results:
QUERY PLAN
---------------
Limit (cost=8.58..18.53 rows=10 width=16) (actual time=0.821..35.987 rows=10 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using reduced_features_id_idx on reduced_features reduced_features_1 (cost=0.29..8.31 rows=1 width=608) (actual time=0.014..0.015 rows=1 loops=1)
Index Cond: (id = 198990)
-> Index Scan using reduced_features_features_idx on reduced_features (cost=0.28..36482.06 rows=36689 width=16) (actual time=0.819..35.984 rows=10 loops=1)
Order By: (features <-> $0)
Planning time: 0.117 ms
Execution time: 36.232 ms
I have 36689 total records in my table. Is my index working?

Related

Query optimization on timestamp and group by in PostgreSQL

I want to query for my table with the following structure:
Table "public.company_geo_table"
Column | Type | Collation | Nullable | Default
--------------------+--------+-----------+----------+---------
geoname_id | bigint | | |
date | text | | |
cik | text | | |
count | bigint | | |
country_iso_code | text | | |
subdivision_1_name | text | | |
city_name | text | | |
Indexes:
"cik_country_index" btree (cik, country_iso_code)
"cik_geoname_index" btree (cik, geoname_id)
"cik_index" btree (cik)
"date_index" brin (date)
I tried with the following sql query, which need to query for a specific cik number during a time perid, and group by the cik with geoname_id(different areas).
select cik, geoname_id, sum(count) as total
from company_geo_table
where cik = '1111111'
and date between '2016-01-01' and '2016-01-10'
group by cik, geoname_id
The explanation result showed that they only use the cik index and date index, and did not use the cik_geoname index. Why? Is there any way I can optimize my solution? Any new indices? Thank you in advance.
HashAggregate (cost=117182.79..117521.42 rows=27091 width=47) (actual time=560132.903..560134.229 rows=3552 loops=1)
Group Key: cik, geoname_id
-> Bitmap Heap Scan on company_geo_table (cost=16467.77..116979.48 rows=27108 width=23) (actual time=6486.232..560114.828 rows=8175 loops=1)
Recheck Cond: ((date >= '2016-01-01'::text) AND (date <= '2016-01-10'::text) AND (cik = '1288776'::text))
Rows Removed by Index Recheck: 16621155
Heap Blocks: lossy=193098
-> BitmapAnd (cost=16467.77..16467.77 rows=27428 width=0) (actual time=6469.640..6469.641 rows=0 loops=1)
-> Bitmap Index Scan on date_index (cost=0.00..244.81 rows=7155101 width=0) (actual time=53.034..53.035 rows=8261120 loops=1)
Index Cond: ((date >= '2016-01-01'::text) AND (date <= '2016-01-10'::text))
-> Bitmap Index Scan on cik_index (cost=0.00..16209.15 rows=739278 width=0) (actual time=6370.930..6370.930 rows=676231 loops=1)
Index Cond: (cik = '1111111'::text)
Planning time: 12.909 ms
Execution time: 560135.432 ms
There is not good estimation (and probably the value '1111111' is used too often (I am not sure about impact, but looks so cik column has wrong data type (text), what can be a reason (or partial reason) of not good estimation.
Bitmap Heap Scan on company_geo_table (cost=16467.77..116979.48 rows=27108 width=23) (actual time=6486.232..560114.828 rows=8175 loops=1)
Looks like composite index (date, cik) can helps
Your problem seems to be here:
Rows Removed by Index Recheck: 16621155
Heap Blocks: lossy=193098
Your work_mem setting is too low, so PostgreSQL cannot fit a bitmap that contains one bit per table row, so it degrades to one bit per 8K block. This means that many false positive hits have to be removed during that bitmap heap scan.
Try with higher work_mem and see if that improves query performance.
The ideal index would be
CREATE INDEX ON company_geo_table (cik, date);

Why is a MAX query with an equality filter on one other column so slow in Postgresql?

I'm running into an issue in PostgreSQL (version 9.6.10) with indexes not working to speed up a MAX query with a simple equality filter on another column. Logically it seems that a simple multicolumn index on (A, B DESC) should make the query super fast.
I can't for the life of me figure out why I can't get a query to be performant regardless of what indexes are defined.
The table definition has the following:
- A primary key foo VARCHAR PRIMARY KEY (not used in the query)
- A UUID field that is NOT NULL called bar UUID
- A sequential_id column that was created as a BIGSERIAL UNIQUE type
Here's what the relevant columns look like exactly (with names modified for privacy):
Table "public.foo"
Column | Type | Modifiers
----------------------+--------------------------+--------------------------------------------------------------------------------
foo_uid | character varying | not null
bar_uid | uuid | not null
sequential_id | bigint | not null default nextval('foo_sequential_id_seq'::regclass)
Indexes:
"foo_pkey" PRIMARY KEY, btree (foo_uid)
"foo_bar_uid_sequential_id_idx", btree (bar_uid, sequential_id DESC)
"foo_sequential_id_key" UNIQUE CONSTRAINT, btree (sequential_id)
Despite having the index listed above on (bar_uid, sequential_id DESC), the following query requires an index scan and takes 100-300ms with a few million rows in the database.
The Query (get the max sequential_id for a given bar_uid):
SELECT MAX(sequential_id)
FROM foo
WHERE bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f';
The EXPLAIN ANALYZE result doesn't use the proper index. Also, for some reason it checks if sequential_id IS NOT NULL even though it's declared as not null.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.75..0.76 rows=1 width=8) (actual time=321.110..321.110 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..0.75 rows=1 width=8) (actual time=321.106..321.106 rows=1 loops=1)
-> Index Scan Backward using foo_sequential_id_key on foo (cost=0.43..98936.43 rows=308401 width=8) (actual time=321.106..321.106 rows=1 loops=1)
Index Cond: (sequential_id IS NOT NULL)
Filter: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Rows Removed by Filter: 920761
Planning time: 0.196 ms
Execution time: 321.127 ms
(9 rows)
I can add a seemingly unnecessary GROUP BY to this query, and that speeds it up a bit, but it's still really slow for a query that should be near instantaneous with indexes defined:
SELECT MAX(sequential_id)
FROM foo
WHERE bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'
GROUP BY bar_uid;
The EXPLAIN (ANALYZE, BUFFERS) result:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=8510.54..65953.61 rows=6 width=24) (actual time=234.529..234.530 rows=1 loops=1)
Group Key: bar_uid
Buffers: shared hit=1 read=11909
-> Bitmap Heap Scan on foo (cost=8510.54..64411.55 rows=308401 width=24) (actual time=65.259..201.969 rows=309023 loops=1)
Recheck Cond: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Heap Blocks: exact=10385
Buffers: shared hit=1 read=11909
-> Bitmap Index Scan on foo_bar_uid_sequential_id_idx (cost=0.00..8433.43 rows=308401 width=0) (actual time=63.549..63.549 rows=309023 loops=1)
Index Cond: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Buffers: shared read=1525
Planning time: 3.067 ms
Execution time: 234.589 ms
(12 rows)
Does anyone have any idea what's blocking this query from being on the order of 10 milliseconds? This should logically be instantaneous with the right index defined. It should only require the time to follow links to the leaf value in the B-Tree.
Someone asked:
What do you get for SELECT * FROM pg_stats WHERE tablename = 'foo' and attname = 'bar_uid';?
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+------------------------+-------------+-----------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------+------------------+-------------+-------------------+------------------------+----------------------
public | foo | bar_uir | f | 0 | 16 | 6 | {fa61424d-389f-4e75-ba2d-b77e6bb8491f,5c5dcae9-1b7e-4413-99a1-62fde2b89c32,50b1e842-fc32-4c2c-b00f-4a17c3c1c5fa,7ff1999c-c0ea-b700-343f-9a737f6ad659,f667b353-e199-4890-9ffd-4940ea11fe2c,b24ce968-29fd-4587-ba1f-227036ee3135} | {0.203733,0.203167,0.201567,0.195867,0.1952,0.000466667} | | -0.158093 | | |
(1 row)

Why do different query values produce different index algorithms?

I have a query and I have created indexes specially for this query. But I just discovered that if I use some particular values, the query stops being running fast, and does a full scan.
Here is the case of fast execution:
explain analyze SELECT
v.valtr_id,
v.block_num,
v.from_id,
v.to_id,
v.from_balance::text,
v.to_balance::text
FROM value_transfer v
WHERE
(v.block_num<=2748053) AND
(
(v.to_id=639291) OR
(v.from_id=639291)
)
ORDER BY
v.block_num DESC,v.valtr_id DESC
LIMIT 1
Limit (cost=23054.03..23054.03 rows=1 width=30) (actual time=1.464..1.465 rows=1 loops=1)
-> Sort (cost=23054.03..23068.94 rows=5964 width=30) (actual time=1.462..1.462 rows=1 loops=1)
Sort Key: block_num DESC, valtr_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on value_transfer v (cost=144.85..23024.21 rows=5964 width=30) (actual time=1.397..1.437 rows=3 loops=1)
Recheck Cond: ((to_id = 639291) OR (from_id = 639291))
Filter: (block_num <= 2748053)
Heap Blocks: exact=3
-> BitmapOr (cost=144.85..144.85 rows=5964 width=0) (actual time=1.339..1.339 rows=0 loops=1)
-> Bitmap Index Scan on vt_to_id_idx (cost=0.00..40.42 rows=1580 width=0) (actual time=0.755..0.755 rows=1 loops=1)
Index Cond: (to_id = 639291)
-> Bitmap Index Scan on vt_from_id_idx (cost=0.00..101.45 rows=4384 width=0) (actual time=0.580..0.580 rows=2 loops=1)
Index Cond: (from_id = 639291)
Planning time: 0.499 ms
Execution time: 1.556 ms
(15 rows)
But if I put the value 199658 as input to my query, it uses different search algorithm:
explain analyze SELECT
v.valtr_id,
v.block_num,
v.from_id,
v.to_id,
v.from_balance::text,
v.to_balance::text
FROM value_transfer v
WHERE
(v.block_num<=2748053) AND
(
(v.to_id=199658) OR
(v.from_id=199658)
)
ORDER BY
v.block_num DESC,v.valtr_id DESC
LIMIT 1 ;
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..6462.99 rows=1 width=30) (actual time=614109.855..614109.856 rows=1 loops=1)
-> Index Scan Backward using bnum_valtr_idx on value_transfer v (cost=0.57..200845479.66 rows=31079 width=30) (actual time=614109.853..614109.853 rows=1 loops=1)
Index Cond: (block_num <= 2748053)
Filter: ((to_id = 199658) OR (from_id = 199658))
Rows Removed by Filter: 101190609
Planning time: 0.515 ms
Execution time: 614109.920 ms
(7 rows)
Why is this happening? I thought that once you have created the indexes for your query , the execution will take the same path always, but it is not the case. How can I make sure Postgres always uses the same algorithm in every search?
I even thought this happens because probably the indexes weren't built cleanly and rebuilt the main index:
postgres=> drop index bnum_valtr_idx;
DROP INDEX
postgres=> CREATE INDEX bnum_valtr_idx ON public.value_transfer USING btree (block_num DESC, valtr_id DESC);
CREATE INDEX
postgres=>
however, this didn't change anything.
My table definitions are:
CREATE TABLE value_transfer (
valtr_id BIGSERIAL PRIMARY KEY,
tx_id BIGINT REFERENCES transaction(tx_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_id INT REFERENCES block(block_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_num INT NOT NULL,
from_id INT NOT NULL,
to_id INT NOT NULL,
value NUMERIC DEFAULT 0,
from_balance NUMERIC DEFAULT 0,
to_balance NUMERIC DEFAULT 0,
kind CHAR NOT NULL,
depth INT DEFAULT 0,
error TEXT NOT NULL
);
postgres=> SELECT * FROM pg_indexes WHERE tablename = 'value_transfer';
schemaname | tablename | indexname | tablespace | indexdef
------------+----------------+---------------------+------------+--------------------------------------------------------------------------------------------------
public | value_transfer | bnum_valtr_idx | | CREATE INDEX bnum_valtr_idx ON public.value_transfer USING btree (block_num DESC, valtr_id DESC)
public | value_transfer | value_transfer_pkey | | CREATE UNIQUE INDEX value_transfer_pkey ON public.value_transfer USING btree (valtr_id)
public | value_transfer | vt_tx_from_idx | | CREATE INDEX vt_tx_from_idx ON public.value_transfer USING btree (tx_id)
public | value_transfer | vt_block_num_idx | | CREATE INDEX vt_block_num_idx ON public.value_transfer USING btree (block_num)
public | value_transfer | vt_from_id_idx | | CREATE INDEX vt_from_id_idx ON public.value_transfer USING btree (from_id)
public | value_transfer | vt_to_id_idx | | CREATE INDEX vt_to_id_idx ON public.value_transfer USING btree (to_id)
public | value_transfer | vt_block_id_idx | | CREATE INDEX vt_block_id_idx ON public.value_transfer USING btree (block_id)
(7 rows)
postgres=>
It could could be that one value is in one column and visa versa. Regardless, an OR over different columns is notorious for causing performance problems because the query plan can only use one index, but the OR would require two indexes to be used to check both columns quickly, so one column will be checked using its index, but the other requires a scan.
The way around this problem is to break the query into a union.
Try this:
SELECT * FROM (
SELECT
valtr_id,
block_num,
from_id,
to_id,
from_balance::text,
to_balance::text
FROM value_transfer
WHERE block_num<=2748053
AND to_id=199658
UNION ALL
SELECT
valtr_id,
block_num,
from_id,
to_id,
from_balance::text,
to_balance::text
FROM value_transfer
WHERE block_num<=2748053
AND from_id=199658
) x
ORDER BY block_num DESC, valtr_id DESC
LIMIT 1

Slow sql select with index on postgres

I have a production database which is replicated to a another host using londist. The table looks like
# \d+ usermessage
Table "public.usermessage"
Column | Type | Modifiers | Description
-------------------+-------------------+-----------+-------------
id | bigint | not null |
subject | character varying | |
message | character varying | |
read | boolean | |
timestamp | bigint | |
owner | bigint | |
sender | bigint | |
recipient | bigint | |
dao_created | bigint | |
dao_updated | bigint | |
type | integer | |
replymessageid | character varying | |
originalmessageid | character varying | |
replied | boolean | |
mheader | boolean | |
mbody | boolean | |
Indexes:
"usermessage_pkey" PRIMARY KEY, btree (id)
"usermessage_owner_key" btree (owner)
"usermessage_recipient_key" btree (recipient)
"usermessage_timestamp_key" btree ("timestamp")
"usermessage_type_key" btree (type)
Has OIDs: no
If executed on the replicated database, the select is fast as expected, if executed on the production host it's horrible slow. To make things more strange, not all timestamps are slow, some of them are fast on both hosts. The filesystem and the storage behind the production host is fine and not under heavy usage. Any ideas?
replication# explain analyse SELECT COUNT(id) FROM usermessage WHERE owner = 1234567 AND timestamp > 1362077127010;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=263.37..263.38 rows=1 width=8) (actual time=0.059..0.060 rows=1 loops=1)
-> Bitmap Heap Scan on usermessage (cost=259.35..263.36 rows=1 width=8) (actual time=0.055..0.055 rows=0 loops=1)
Recheck Cond: ((owner = 1234567) AND ("timestamp" > 1362077127010::bigint))
-> BitmapAnd (cost=259.35..259.35 rows=1 width=0) (actual time=0.054..0.054 rows=0 loops=1)
-> Bitmap Index Scan on usermessage_owner_key (cost=0.00..19.27 rows=241 width=0) (actual time=0.032..0.032 rows=33 loops=1)
Index Cond: (owner = 1234567)
-> Bitmap Index Scan on usermessage_timestamp_key (cost=0.00..239.82 rows=12048 width=0) (actual time=0.013..0.013 rows=0 loops=1)
Index Cond: ("timestamp" > 1362077127010::bigint)
Total runtime: 0.103 ms
(9 rows)
production# explain analyse SELECT COUNT(id) FROM usermessage WHERE owner = 1234567 AND timestamp > 1362077127010;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=267.39..267.40 rows=1 width=8) (actual time=47536.590..47536.590 rows=1 loops=1)
-> Bitmap Heap Scan on usermessage (cost=263.37..267.38 rows=1 width=8) (actual time=47532.520..47536.579 rows=3 loops=1)
Recheck Cond: ((owner = 1234567) AND ("timestamp" > 1362077127010::bigint))
-> BitmapAnd (cost=263.37..263.37 rows=1 width=0) (actual time=47532.334..47532.334 rows=0 loops=1)
-> Bitmap Index Scan on usermessage_owner_key (cost=0.00..21.90 rows=168 width=0) (actual time=0.123..0.123 rows=46 loops=1)
Index Cond: (owner = 1234567)
-> Bitmap Index Scan on usermessage_timestamp_key (cost=0.00..241.22 rows=12209 width=0) (actual time=47530.255..47530.255 rows=5255617 loops=1)
Index Cond: ("timestamp" > 1362077127010::bigint)
Total runtime: 47536.668 ms
(9 rows)
I am less familiar with postgresql than mysql but
(actual time=0.013..0.013 rows=0 loops=1)
and
(actual time=47530.255..47530.255 rows=5255617 loops=1)
Suggests to me that your production db has more data given that the rows are drastically different.

Making Postgres Query Faster. More Indexes?

I'm running Geodjango/Postgres 9.1/PostGIS and I'm trying to get the following query (and others like it) to run faster.
[query snipped for brevity]
SELECT "crowdbreaks_incomingkeyword"."keyword_id"
, COUNT("crowdbreaks_incomingkeyword"."keyword_id") AS "cnt"
FROM "crowdbreaks_incomingkeyword"
INNER JOIN "crowdbreaks_tweet"
ON ("crowdbreaks_incomingkeyword"."tweet_id"
= "crowdbreaks_tweet"."tweet_id")
LEFT OUTER JOIN "crowdbreaks_place"
ON ("crowdbreaks_tweet"."place_id"
= "crowdbreaks_place"."place_id")
WHERE (("crowdbreaks_tweet"."coordinates"
# ST_GeomFromEWKB(E'\\001 ... \\000\\000\\000\\0008#'::bytea)
OR ST_Overlaps("crowdbreaks_place"."bounding_box"
, ST_GeomFromEWKB(E'\\001...00\\000\\0008#'::bytea)
))
AND "crowdbreaks_tweet"."created_at" > E'2012-04-17 15:46:12.109893'
AND "crowdbreaks_tweet"."created_at" < E'2012-04-18 15:46:12.109899' )
GROUP BY "crowdbreaks_incomingkeyword"."keyword_id"
, "crowdbreaks_incomingkeyword"."keyword_id"
;
Here is what the crowdbreaks_tweet table looks like:
\d+ crowdbreaks_tweet;
Table "public.crowdbreaks_tweet"
Column | Type | Modifiers | Storage | Description
---------------+--------------------------+-----------+----------+-------------
tweet_id | bigint | not null | plain |
tweeter | bigint | not null | plain |
text | text | not null | extended |
created_at | timestamp with time zone | not null | plain |
country_code | character varying(3) | | extended |
place_id | character varying(32) | | extended |
coordinates | geometry | | main |
Indexes:
"crowdbreaks_tweet_pkey" PRIMARY KEY, btree (tweet_id)
"crowdbreaks_tweet_coordinates_id" gist (coordinates)
"crowdbreaks_tweet_created_at" btree (created_at)
"crowdbreaks_tweet_place_id" btree (place_id)
"crowdbreaks_tweet_place_id_like" btree (place_id varchar_pattern_ops)
Check constraints:
"enforce_dims_coordinates" CHECK (st_ndims(coordinates) = 2)
"enforce_geotype_coordinates" CHECK (geometrytype(coordinates) = 'POINT'::text OR coordinates IS NULL)
"enforce_srid_coordinates" CHECK (st_srid(coordinates) = 4326)
Foreign-key constraints:
"crowdbreaks_tweet_place_id_fkey" FOREIGN KEY (place_id) REFERENCES crowdbreaks_place(place_id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "crowdbreaks_incomingkeyword" CONSTRAINT "crowdbreaks_incomingkeyword_tweet_id_fkey" FOREIGN KEY (tweet_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED
TABLE "crowdbreaks_tweetanswer" CONSTRAINT "crowdbreaks_tweetanswer_tweet_id_id_fkey" FOREIGN KEY (tweet_id_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no
And here is the explain analyze for the query:
HashAggregate (cost=184022.03..184023.18 rows=115 width=4) (actual time=6381.707..6381.769 rows=62 loops=1)
-> Hash Join (cost=103857.48..183600.24 rows=84357 width=4) (actual time=1745.449..6377.505 rows=3453 loops=1)
Hash Cond: (crowdbreaks_incomingkeyword.tweet_id = crowdbreaks_tweet.tweet_id)
-> Seq Scan on crowdbreaks_incomingkeyword (cost=0.00..36873.97 rows=2252597 width=12) (actual time=0.008..2136.839 rows=2252597 loops=1)
-> Hash (cost=102535.68..102535.68 rows=80544 width=8) (actual time=1744.815..1744.815 rows=3091 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 32kB
-> Hash Left Join (cost=16574.93..102535.68 rows=80544 width=8) (actual time=112.551..1740.651 rows=3091 loops=1)
Hash Cond: ((crowdbreaks_tweet.place_id)::text = (crowdbreaks_place.place_id)::text)
Filter: ((crowdbreaks_tweet.coordinates # '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) OR ((crowdbreaks_place.bounding_box && '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) AND _st_overlaps(crowdbreaks_place.bounding_box, '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry)))
-> Bitmap Heap Scan on crowdbreaks_tweet (cost=15874.18..67060.28 rows=747873 width=125) (actual time=96.012..940.462 rows=736784 loops=1)
Recheck Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone))
-> Bitmap Index Scan on crowdbreaks_tweet_crreated_at (cost=0.00..15687.22 rows=747873 width=0) (actual time=94.259..94.259 rows=736784 loops=1)
Index Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone))
-> Hash (cost=217.11..217.11 rows=6611 width=469) (actual time=15.926..15.926 rows=6611 loops=1)
Buckets: 1024 Batches: 4 Memory Usage: 259kB
-> Seq Scan on crowdbreaks_place (cost=0.00..217.11 rows=6611 width=469) (actual time=0.005..6.908 rows=6611 loops=1)
Total runtime: 6381.903 ms
(17 rows)
That's a pretty bad runtime for the query. Ideally, I'd like to get results back in a second or two.
I've increased shared_buffers on Postgres to 2GB (I have 8GB of RAM) but other than that I'm not quite sure what to do. What are my options? Should I do fewer joins? Are there any other indexes I can throw on there? The sequential scan on crowdbreaks_incomingkeyword doesn't make sense to me. It's a table of foreign keys to other tables, and thus has indexes on it.
Judging from your comment I would try two things:
Raise statistics target for involved columns (and run ANALYZE).
ALTER TABLE tbl ALTER COLUMN column SET STATISTICS 1000;
The data distribution may be uneven. A bigger sample may provide the query planner with more accurate estimates.
Play with the cost settings in postgresql.conf. Your sequential scans might need to be more expensive compared to your index scans to give good estimates.
Try to lower the cost for cpu_index_tuple_cost and set effective_cache_size to something as high as three quaters of your total RAM for a dedicated DB server.