Optimizing a postgis query - why is 2nd index not being used? - postgresql

We have a table with tens of millions of polygons and we have this index:
CREATE INDEX IF NOT EXISTS polygons_geog_idx ON polygons USING GIST(geog);
That let us query the DB really efficiently, like so:
SELECT * FROM polygons WHERE st_dwithin('SRID=4326;POINT(-1 50)'::geography, geog, 500);
Now due to the business requirements, we need to return only biggest 200 polygons. Easily doable like with:
LIMIT 200
ORDER BY st_area(geog)
Full Query:
SELECT gid, st_area(geog) as size FROM polygons WHERE st_dwithin(geog, 'SRID=4326;POINT(-1 50)'::geography, 500) ORDER BY st_area(geog) DESC LIMIT 200.
Because of the order by and select our query slows down by 10x. I thought it will be easily fixable by adding another index like seen in this SO Answer: CREATE INDEX polygons_geog_area_idx ON polygons (st_area(geog));
But polygons_geog_area_idx doesn't seem to be picked up:
Sort (cost=8.23..8.23 rows=1 width=12) (actual time=133.755..142.427 rows=2325 loops=1)
Sort Key: (st_area(geog, true))
Sort Method: quicksort Memory: 205kB
-> Index Scan using polygons_geog_idx on polygons (cost=0.14..8.22 rows=1 width=12) (actual time=0.468..121.974 rows=2325 loops=1)
Index Cond: (geog && '0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography)
Filter: (('0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography && _st_expand(geog, '500'::double precision)) AND _st_dwithin(geog, '0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography, '500'::double precision, true))
Rows Removed by Filter: 3
Planning Time: 0.157 ms
Execution Time: 151.196 ms
(note: this is on development dataset, much smaller than actual dataset this will run on later)
What am I missing? Can you even use 2 indexes like I want?

PostgreSQL cannot combine two indexes in this way, one for the order and one for selectivity.
In order to sort by the area, it first needs to compute the area. The sort itself is fast (taking only 15% of the time) so it must be the computation of the area which is slow. An EXPLAIN VERBOSE suggests to me that the computation of the area is done as part of the index scan and then the result passed up to the sort, rather than being done in the sort itself. So it makes sense that the timing of doing this would be attributed to the index scan.
To improve the time needed to compute the area, you could compute and store it as part of the table. The best way to do that (with new enough version) is with a generated column.
alter table polygons add polygon_area double precision generated always as (st_area(geog)) stored;

Related

Why doesn't postresql use all columns in a multi-column index?

I am using the extension
CREATE EXTENSION btree_gin;
I have an index that looks like this...
create index boundaries2 on rets USING GIN(source, isonlastsync, status, (geoinfo::jsonb->'boundaries'), ctcvalidto, searchablePrice, ctcSortOrder);
before I started messing with it, the index looked like this, with the same results that I'm about to share, so it seems minor variations in the index definition don't make a difference:
create index boundaries on rets USING GIN((geoinfo::jsonb->'boundaries'), source, status, isonlastsync, ctcvalidto, searchablePrice, ctcSortOrder);
I give pgsql 11 this query:
explain analyze select id from rets where ((geoinfo::jsonb->'boundaries' ?| array['High School: Torrey Pines']) AND source='SDMLS'
AND searchablePrice>=800000 AND searchablePrice<=1200000 AND YrBlt>=2000 AND EstSF>=2300
AND Beds>=3 AND FB>=2 AND ctcSortOrder>'2019-07-05 16:02:54 UTC' AND Status IN ('ACTIVE','BACK ON MARKET')
AND ctcvalidto='9999-12-31 23:59:59 UTC' AND isonlastsync='true') order by LstDate desc, ctcSortOrder desc LIMIT 3000;
with result...
Limit (cost=120.06..120.06 rows=1 width=23) (actual time=472.849..472.850 rows=1 loops=1)
-> Sort (cost=120.06..120.06 rows=1 width=23) (actual time=472.847..472.848 rows=1 loops=1)
Sort Key: lstdate DESC, ctcsortorder DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on rets (cost=116.00..120.05 rows=1 width=23) (actual time=472.748..472.841 rows=1 loops=1)
Recheck Cond: ((source = 'SDMLS'::text) AND (((geoinfo)::jsonb -> 'boundaries'::text) ?| '{"High School: Torrey Pines"}'::text[]) AND (ctcvalidto = '9999-12-31 23:59:59+00'::timestamp with time zone) AND (searchableprice >= 800000) AND (searchableprice <= 1200000) AND (ctcsortorder > '2019-07-05 16:02:54+00'::timestamp with time zone))
Rows Removed by Index Recheck: 93
Filter: (isonlastsync AND (yrblt >= 2000) AND (estsf >= 2300) AND (beds >= 3) AND (fb >= 2) AND (status = ANY ('{ACTIVE,"BACK ON MARKET"}'::text[])))
Rows Removed by Filter: 10
Heap Blocks: exact=102
-> Bitmap Index Scan on boundaries2 (cost=0.00..116.00 rows=1 width=0) (actual time=471.762..471.762 rows=104 loops=1)
Index Cond: ((source = 'SDMLS'::text) AND (((geoinfo)::jsonb -> 'boundaries'::text) ?| '{"High School: Torrey Pines"}'::text[]) AND (ctcvalidto = '9999-12-31 23:59:59+00'::timestamp with time zone) AND (searchableprice >= 800000) AND (searchableprice <= 1200000) AND (ctcsortorder > '2019-07-05 16:02:54+00'::timestamp with time zone))
Planning Time: 0.333 ms
Execution Time: 474.311 ms
(14 rows)
The Question
Why are the columns status and isonlastsync not used by the Bitmap Index Scan on boundaries2?
It can do so if it predicts that filtering out those columns will be faster. This is usually the case if cardinality on columns is very low and you will fetch large enough portion of all rows; this is true for boolean like isonlastsync and usually true for status columns with just a few distinct values.
Rows Removed by Filter: 10 this is very little to filter out, either because your table does not hold large number of rows or most of them fit into condition you specified for those two columns. You might try generating more data in that table or selecting rows with rare status.
I suggest doing partial indexes (with WHERE condition), at least for boolean value and remove those two columns to make this index a bit more lightweight.
I cannot tell you why, but I can help you optimize the query.
You should not use a multi-column GIN index, but a GIN index on only the jsonb expression and a b-tree index on the other columns.
The order of columns matters: put the oned used in an equality condition first, with the most selective in the beginning. Next put the column with the must selective inequality or IN conditions. For the following columns, the order doesn't matter, as they will only act as filters in the index scan.
Make sure that the indexes are cached in RAM.
I'd expect you to be faster that way.
I think you're asking yourself the wrong question. As Lukasz answered already, PostgreSQL may find inefficient to check all columns in the index. The problem here is that your index is too big on disk.
Probably by trying to make this SQL faster you added as many columns as possible to the index, but it is backfiring.
The trick is to realize how much data PostgreSQL has to read to find your records. If your index contains too much data, it will have to read a lot. Also, be aware that low cardinality columns don't play well with BTree and common index types; generally you want to avoid indexing them.
To have an index as small as possible and it's quick to do lookups you have to find the column with more cardinality, or better, the column that will return less rows for your query. My guess is "ctcSortOrder". This will be the first column of your index.
Now look, after filtering by the 1st column, which column has now the most cardinality or will filter out most rows. I have no idea on your data, but "source" looks like a good candidate.
Try to avoid jsonb searches unless they are the primary source of the cardinality, and keep the index as a Btree. BTree is several times faster.
And like Lukasz suggested, look on partial indexes. For example add "WHERE Status IN ('ACTIVE','BACK ON MARKET') AND isonlastsync='true'" as these two may be common for all your searches.
Bottom line is, having a simpler, smaller index is faster than having all columns indexed. And the order of the columns does matter a lot. Stick with BTree unless there is a good reason (lots of cardinality in non-btree compatible types).
If your table is huge (>10M rows) consider table partitioning, for example by ctcSortOrder. But I don't think this is your case.

Slow distinct PostgreSQL query on nested jsonb field won't use index

I'm trying to get distinct values from a nested field on JSONB column, but it takes about 2 minutes on a 400K rows table.
The original query used DISTINCT but then I read that GROUP BY works better so tried this too, but no luck - still extremely slow.
Adding an index did not help either:
create index "orders_financial_status_index" on orders ((data ->'data'->> 'financial_status'));
ANALYZE EXPLAIN gave this result:
HashAggregate (cost=13431.16..13431.22 rows=4 width=32) (actual time=123074.941..123074.943 rows=4 loops=1)
Group Key: ((data -> 'data'::text) ->> 'financial_status'::text)
-> Seq Scan on orders (cost=0.00..12354.14 rows=430809 width=32) (actual time=2.993..122780.325 rows=434080 loops=1)
Planning time: 0.119 ms
Execution time: 123074.979 ms
It's worth mentioning that there are no null values on this column, and currently there are 4 unique values.
What should I do in order to query the distinct values faster?
No index will make this faster, because the query has to scan the whole table.
As you can see, the sequential scan uses almost all the time; the hash aggregate is fast.
Still I would not drop the index, because it allows PostgreSQL to estimate the number of groups accurately and decide on the more efficient hash aggregate rather than sorting the rows. You can try without the index to be sure.
However, two minutes for half a million rows is not very fast. Do you have slow storage? Is the table bloated? If the latter, VACUUM (FULL) should improve things.
You can speed up the query by reducing I/O. Load the table into RAM with pg_prewarm, then processing should be considerably faster.

PostgreSQL index not used for query on IP ranges

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:
CREATE TABLE ips (
id serial NOT NULL,
begin_ip_num bigint,
end_ip_num bigint,
country_name character varying(255),
CONSTRAINT ips_pkey PRIMARY KEY (id )
)
I've added plain B-tree indices on both begin_ip_num and end_ip_num:
CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );
The query being used is:
SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;
The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:
Index Scan using index_ips_on_begin_ip_num on ips (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms
I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.
Try a multicolumn index, but with reversed order on the second column:
CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);
Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.
With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.
Either way, CLUSTER using the multicolumn index from above can help performance:
CLUSTER ips USING index_ips_begin_end_ip_num
This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.
(There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)
Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.
What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2. See:
Optimizing queries on a range of timestamps (two columns)
If your IP ranges can be covered with one of the built-in network types inet or cidr, consider to replace your two bigint columns. Or, better yet, look to the additional module ip4r by Andrew Gierth (not in the standard distribution. The indexing strategy changes accordingly.
Barring that, you can check out this related answer on dba.SE with using a sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance:
Can spatial index help a “range - order by - limit” query
I had exactly this same problem on a nearly identical dataset from maxmind.com's free geiop table. I solved it using Erwin's tip about range types and GiST indexes. The GiST index was key. Without it I was querying at best about 3 rows per second. With it I queried nearly 500000 rows in under 10 seconds! Since Erwin didn't post detailed instructions on how to do this, I thought I'd add them, here...
First of all, you must add a new column having the range type, note that int8range is required for bigint types. Next set its values appropriately, note that the '[]' parameter indicates to make the range inclusive at lower and upper bounds (rtfm). Finally add the index, note that the GiST index is where all the performance advantage comes from.
alter table ips add column iprange int8range;
update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]');
create index index_ips_on_iprange on ips using gist (iprange);
Having laid the groundwork, you can now use the '<#' contained-by operator to search specific addresses against the table. See http://www.postgresql.org/docs/9.2/static/functions-range.html
SELECT "ips".* FROM "ips" WHERE (3065106743::bigint <# iprange);
I'm a bit late to this party, but this is what works really well for me.
Consider installing ip4r extension. It basically allows you to define a column that can hold IP ranges. The name of the extension implies it is just for IPv4, but currently it is also support IPv6.
After you populate table with ranges within that column all you need, is to create GIST index:
CREATE INDEX ip_zip_ip4_range ON ip_zip USING gist (ip4_range);
I have almost 10 million ranges in my database, but queries take fraction of a milisecond:
region=> select count(*) from ip_zip ;
count
---------
9566133
region=> explain analyze select * from ip_zip where '8.8.8.8'::ip4 <<= ip4_range;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_zip (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.085..0.086 rows=1 loops=1)
Recheck Cond: ('8.8.8.8'::ip4r <<= ip4_range)
Heap Blocks: exact=1
-> Bitmap Index Scan on ip_zip_ip4_range (cost=0.00..232.16 rows=9566 width=0) (actual time=0.055..0.055 rows=1 loops=1)
Index Cond: ('8.8.8.8'::ip4r <<= ip4_range)
Planning time: 0.106 ms
Execution time: 0.118 ms
(7 rows)
region=> explain analyze select * from ip_zip where '254.50.22.54'::ip4 <<= ip4_range;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_zip (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.059..0.059 rows=1 loops=1)
Recheck Cond: ('254.50.22.54'::ip4r <<= ip4_range)
Heap Blocks: exact=1
-> Bitmap Index Scan on ip_zip_ip4_range (cost=0.00..232.16 rows=9566 width=0) (actual time=0.048..0.048 rows=1 loops=1)
Index Cond: ('254.50.22.54'::ip4r <<= ip4_range)
Planning time: 0.102 ms
Execution time: 0.145 ms
(7 rows)
I believe your query looks like WHERE [constant] BETWEEN begin_ip_num AND end_ipnum or
As far as I know Postgres doesn't have "AND-EQUAL " access plan, so you need to add a composite index on 2 columns as suggested by Erwin Brandstetter.

Postgres combining multiple Indexes

I have the following table/indexes -
CREATE TABLE test
(
coords geography(Point,4326),
user_id varchar(50),
created_at timestamp
);
CREATE INDEX ix_coords ON test USING GIST (coords);
CREATE INDEX ix_user_id ON test (user_id);
CREATE INDEX ix_created_at ON test (created_at DESC);
This is the query I want to execute:
select *
from updates
where ST_DWithin(coords, ST_MakePoint(-126.4, 45.32)::geography, 30000)
and user_id='3212312'
order by created_at desc
limit 60
When I run the query it only uses ix_coords index. How can I ensure that Postgres uses ix_user_id and ix_created_at index as well for the query?
This is a new table in which I did bulk insert of production data. Total rows in the test table: 15,069,489
I am running PostgreSQL 9.2.1 (with Postgis) with (effective_cache_size = 2GB). This is my local OSX with 16GB RAM, Core i7/2.5 GHz, non-SSD disk.
Adding the EXPLAIN ANALYZE output -
Limit (cost=71.64..71.65 rows=1 width=280) (actual time=1278.652..1278.665 rows=60 loops=1)
-> Sort (cost=71.64..71.65 rows=1 width=280) (actual time=1278.651..1278.662 rows=60 loops=1)
Sort Key: created_at
Sort Method: top-N heapsort Memory: 33kB
-> Index Scan using ix_coords on test (cost=0.00..71.63 rows=1 width=280) (actual time=0.198..1278.227 rows=178 loops=1)
Index Cond: (coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography)
Filter: (((user_id)::text = '4f1092000b921a000100015c'::text) AND ('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
Rows Removed by Filter: 3122459
Total runtime: 1278.701 ms
UPDATE:
Based on the suggestions below I tried index on cords + user_id:
CREATE INDEX ix_coords_and_user_id ON updates USING GIST (coords, user_id);
..but get the following error:
ERROR: data type character varying has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
UPDATE:
So the CREATE EXTENSION btree_gist; solved the btree/gist compound index issue. And now my index looks like
CREATE INDEX ix_coords_user_id_created_at ON test USING GIST (coords, user_id, created_at);
NOTE: btree_gist does not accept DESC/ASC.
New query plan:
Limit (cost=134.99..135.00 rows=1 width=280) (actual time=273.282..273.292 rows=60 loops=1)
-> Sort (cost=134.99..135.00 rows=1 width=280) (actual time=273.281..273.285 rows=60 loops=1)
Sort Key: created_at
Sort Method: quicksort Memory: 41kB
-> Index Scan using ix_updates_coords_user_id_created_at on updates (cost=0.00..134.98 rows=1 width=280) (actual time=0.406..273.110 rows=115 loops=1)
Index Cond: ((coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography) AND ((user_id)::text = '4e952bb5b9a77200010019ad'::text))
Filter: (('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
Rows Removed by Filter: 1
Total runtime: 273.331 ms
The query is performing better than before, almost a second better but still not great. I guess this is the best that I can get?? I was hoping somewhere around 60-80ms. Also taking order by created_at desc from the query, shaves off another 100ms, meaning it is unable to use the index. Anyway to fix this?
I don't know if Pg can combine a GiST index and regular b-tree indexes with a bitmap index scan, but I suspect not. You may be getting the best result you can without adding a user_id column to your GiST index (and consequently making it bigger and slower for other queries that don't use user_id).
As an experiment you could:
CREATE EXTENSION btree_gist;
CREATE INDEX ix_coords_and_user_id ON test USING GIST (coords, user_id);
which is likely to result in a big index, but might boost that query - if it works. Be aware that maintaining such an index will significantly slow INSERT and UPDATEs. If you drop the old ix_coords your queries will use ix_coords_and_user_id even if they don't filter on user_id, but it'll be slower than ix_coords. Keeping both will make the INSERT and UPDATE slowdown even worse.
See btree-gist
(Obsoleted by edit to question that changes the question completely; when written the user had a multicolumn index they've now split into two separate ones):
You don't seem to be filtering or sorting on user_id, only create_date. Pg won't (can't?) use only the second term of a multi-column index like (user_id, create_date), it needs use of the first item too.
If you want to index create_date, create a separate index for it. If you use and need the (user_id, create_date) index and don't generally use just user_id alone, see if you can reverse the column order. Alternately create two independent indexes, (user_id) and (create_date). When both columns are needed Pg can combine the two indepependent indexes using a bitmap index scan.
I think Craig is correct with his answer, but I just wanted to add a few things (and it wouldn't fit in a comment)
You have to work pretty hard to force PostgreSQL to use an index. The Query optimizer is smart and there are times where it will believe that a sequential table scan will be faster. It is usually right! :) But, you can play with some settings (such as seq_page_cost, random_page_cost, etc) you can play with to try and get it to favor an index. Here is a link to some of the configurations that you might want to examine if you feel like it is not making the correct decision. But, again... my experience is that most of the time, Postgres is smarter than I am! :)
Hope this helps you (or someone in the future).

How to influence the Postgres Query Analyzer when dealing with text search and geospatial data

I have a quite serious performance issue with the following statement that i can't fix myself.
Given Situation
I have a postgres 8.4 Database with Postgis 1.4 installed
I have a geospatial table with ~9 Million entries. This table has a (postgis) geometry column and a tsvector column
I have a GIST Index on the geometry and a VNAME Index on the vname column
Table is ANALYZE'd
I want to excecute a to_tsquerytext search within a subset of these geometries that should give me all affecteded ids back.
The area to search in will limit the 9 Million datasets to approximately 100.000 and the resultset of the ts_query inside this area will most likely give an output of 0..1000 Entries.
Problem
The query analyzer decides that he wants to do an Bitmap Index Scan on the vname first, and then aggreates and puts a filter on the geometry (and other conditions I have in this statement)
Query Analyzer output:
Aggregate (cost=12.35..12.62 rows=1 width=510) (actual time=5.616..5.616 rows=1 loops=1)
-> Bitmap Heap Scan on mxgeom g (cost=8.33..12.35 rows=1 width=510) (actual time=5.567..5.567 rows=0 loops=1)
Recheck Cond: (vname ## '''hemer'' & ''hauptstrasse'':*'::tsquery)
Filter: (active AND (geom && '0107000020E6100000010000000103000000010000000B0000002AFFFF5FD15B1E404AE254774BA8494096FBFF3F4CC11E40F37563BAA9A74940490200206BEC1E40466F209648A949404DF6FF1F53311F400C9623C206B2494024EBFF1F4F711F404C87835954BD4940C00000B0E7CA1E4071551679E0BD4940AD02004038991E40D35CC68418BE49408EF9FF5F297C1E404F8CFFCB5BBB4940A600006015541E40FAE6468054B8494015040060A33E1E4032E568902DAE49402AFFFF5FD15B1E404AE254774BA84940'::geometry) AND (mandator_id = ANY ('{257,1}'::bigint[])))
-> Bitmap Index Scan on gis_vname_idx (cost=0.00..8.33 rows=1 width=0) (actual time=5.566..5.566 rows=0 loops=1)
Index Cond: (vname ## '''hemer'' & ''hauptstrasse'':*'::tsquery)
which causes a LOT of I/O - AFAIK It would be smarter to limit the geometry first, and do the vname search after.
Attempted Solutions
To achieve the desired behaviour i tried to
I Put the geom ## AREA into a subselect -> Did not change the execution plan
I created a temporary view with the desired area subset -> Did not change the execution plan
I created a temporary table of the desired area -> Takes 4~6 seconds to create, so that made it even worse.
Btw, sorry for not posting the actual query: I think my boss would really be mad at me if I did, also I'm looking more for theoretical pointers for someone to fix my actual query. Please ask if you need further clarification
EDIT
Richard had a very good point: You can achieve the desired behaviour of the Query Planner with the width statement. The bad thing is that this temporary Table (or CTE) messes up the vname index, thus making the query return nothing in some cases.
I was able to fix this with creating a new vname on-the-fly with to_tsvector(), but this is (too) costly - around 300 - 500ms per query.
My Solution
I ditched the vname search and went with a simple LIKE('%query_string%') (10-20 ms/query), but this is only fast in my given environment. YMMV.
There have been some improvements in statistics handling for tsvector (and I think PostGIS too, but I don't use it). If you've got the time, it might be worth trying again with a 9.1 release and see what that does for you.
However, for this single query you might want to look at the WITH construct.
http://www.postgresql.org/docs/8.4/static/queries-with.html
If you put the geometry part as the WITH clause, it will be evaluated first (guaranteed) and then that result-set will filtered by the following SELECT. It might end up slower though, you won't know until you try.
It might be an adjustment to work_mem would help too - you can do this per-session ("SET work_mem = ...") but be careful with setting it too high - concurrent queries can quickly burn through all your RAM.
http://www.postgresql.org/docs/8.4/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY