Optimize Postgres query on timestamp range - postgresql

I have the following table and indices defined:
CREATE TABLE ticket (
wid bigint NOT NULL DEFAULT nextval('tickets_id_seq'::regclass),
eid bigint,
created timestamp with time zone NOT NULL DEFAULT now(),
status integer NOT NULL DEFAULT 0,
argsxml text,
moduleid character varying(255),
source_id bigint,
file_type_id bigint,
file_name character varying(255),
status_reason character varying(255),
...
)
I created an index on the created timestamp as follows:
CREATE INDEX ticket_1_idx
ON ticket
USING btree
(created );
Here's my query:
select * from ticket
where created between '2012-12-19 00:00:00' and '2012-12-20 00:00:00'
This was working fine until the number of records started to grow (about 5 million) and now it's taking forever to return.
Explain analyze reveals this:
Index Scan using ticket_1_idx on ticket (cost=0.00..10202.64 rows=52543 width=1297) (actual time=0.109..125.704 rows=53340 loops=1)
Index Cond: ((created >= '2012-12-19 00:00:00+00'::timestamp with time zone) AND (created <= '2012-12-20 00:00:00+00'::timestamp with time zone))
Total runtime: 175.853 ms
So far I've tried setting:
random_page_cost = 1.75
effective_cache_size = 3
Also created:
create CLUSTER ticket USING ticket_1_idx;
Nothing works. What am I doing wrong? Why is it selecting sequential scan? The indexes are supposed to make the query fast. Anything that can be done to optimize it?

CLUSTER
If you intend to use CLUSTER, the displayed syntax is invalid.
create CLUSTER ticket USING ticket_1_idx;
Run once:
CLUSTER ticket USING ticket_1_idx;
This can help a lot with bigger result sets. Less for a single or few rows returned.
If your table isn't read-only the effect deteriorates over time. Re-run CLUSTER at reasonable intervals. Postgres remembers the index for subsequent calls, so this works, too:
CLUSTER ticket;
(But I would rather be explicit and use the first form.)
However, if you have lots of updates, CLUSTER (or VACUUM FULL) may actually be bad for performance. The right amount of bloat allows UPDATE to place new row versions on the same data page and avoids the need for extending the underlying physical file (expensively) too often. You can use a carefully tuned FILLFACTOR to get the best of both worlds:
Fillfactor for a sequential index that is PK
pg_repack / pg_squeeze
CLUSTER takes an exclusive lock on the table, which may be a problem in a multi-user environment. Quoting the manual:
When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired
on it. This prevents any other database operations (both reads and
writes) from operating on the table until the CLUSTER is finished.
Bold emphasis mine. Consider the alternatives!
pg_repack:
Unlike CLUSTER and VACUUM FULL it works online, without holding an
exclusive lock on the processed tables during processing. pg_repack is
efficient to boot, with performance comparable to using CLUSTER directly.
and:
pg_repack needs to take an exclusive lock at the end of the reorganization.
The current version 1.4.7 works with PostgreSQL 9.4 - 14.
pg_squeeze is a newer alternative that claims:
In fact we try to replace pg_repack extension.
The current version 1.4 works with Postgres 10 - 14.
Query
The query is simple enough not to cause any performance problems per se.
However: The BETWEEN construct includes boundaries. Your query selects all of Dec. 19, plus records from Dec. 20, 00:00. That's an extremely unlikely requirement. Chances are, you really want:
SELECT *
FROM ticket
WHERE created >= '2012-12-19 00:00'
AND created < '2012-12-20 00:00';
Performance
Why is it selecting sequential scan?
Your EXPLAIN output clearly shows an Index Scan, not a sequential table scan. There must be some kind of misunderstanding.
You may be able to improve performance, but the necessary background information is not in the question. Possible options include:
Only query required columns instead of * to reduce transfer cost (and other performance benefits).
Look at partitioning and put practical time slices into separate tables. Add indexes to partitions as needed.
If partitioning is not an option, another related but less intrusive technique would be to add one or more partial indexes.
For example, if you mostly query the current month, you could create the following partial index:
CREATE INDEX ticket_created_idx ON ticket(created)
WHERE created >= '2012-12-01 00:00:00'::timestamp;
CREATE a new index right before the start of a new month. You can easily automate the task with a cron job.
Optionally DROP partial indexes for old months later.
Keep the total index in addition for CLUSTER (which cannot operate on partial indexes). If old records never change, table partitioning would help this task a lot, since you only need to re-cluster newer partitions.
Then again if records never change at all, you probably don't need CLUSTER.
Performance Basics
You may be missing one of the basics. All the usual performance advice applies:
https://wiki.postgresql.org/wiki/Slow_Query_Questions
https://wiki.postgresql.org/wiki/Performance_Optimization

Related

Query takes long time to run on postgreSQL database despite creating an index

Using PostgreSQL 14.3.1, I have created a database instance that is now 1TB in size. The main userlogs table is 751GB in size with 525GB used for data and 226GB used for various indexes on this table. The userlogs table currently contains over 900 million rows. In order to assist with querying this table, a separate Logdates table holds all unique dates for the user logs and there is an integer foreign key column for logdates created in userlogs called logdateID. Amongst the various indexes on the userlogs table, one of them is on logdateID. There are 104 date entries in Logdates table. When running the below query I would expect the index to be used and the 104 records to be retrieved in a reasonable period of time.
select distinct logdateid from userlogs;
This query took a few hours to return with the data. I did an explain plan on the query and the output is as shown below.
"HashAggregate (cost=80564410.60..80564412.60 rows=200 width=4)"
" Group Key: logdateid"
" -> Seq Scan on userlogs (cost=0.00..78220134.28 rows=937710528 width=4)"
I then issues the below command to request the database to use the index.
set enable_seqscan=off
The revised explain plan now comes as below:
"Unique (cost=0.57..3705494150.82 rows=200 width=4)"
" -> Index Only Scan using ix_userlogs_logdateid on userlogs (cost=0.57..3703149874.49 rows=937710528 width=4)"
However, when running the same query, it still takes a few hours to retrieve the data. My question is, why should it take that long to retrieve the data if it is doing an index only scan?
The machine on which the database sits is highly spec'd: a xeon 16-core processor, that with virtualisation enabled, gives 32 logical cores. There is 96GB of RAM and data storage is via a RAID 10 configured 2TB SSD disk with a separate 500GB system SSD disk.
There is no possibilities to optimize such queries in PostGreSQL due to the internal structure of the data storage into rows inside pages.
All queries involving an aggregate in PostGreSQL such as COUNT, COUNT DISTINCT or DISTINCT must read all rows inside the table pages to produce the result.
Let'us take a look over the paper I wrote about this problem :
PostGreSQL vs Microsoft SQL Server – Comparison part 2 : COUNT performances
It seems like your table has none of its pages set as all visible (compare pg_class.relallvisible to the actual number of pages in the table), which is weird because even insert-only tables should get autovacuumed in v13 and up. This will severely punish the index-only scan. You can try to manually vacuum the table to see if that changes things.
It is also weird that it is not using parallelization. It certainly should be. What are your non-default configuration settings?
Finally, I wouldn't expect even the poor plan you show to take a few hours. Maybe your hardware is not performing up to what it should. (Also, RAID 10 requires at least 4 disks, but your description makes it sound like that is not what you have)
Since you have the foreign key table, you could use that in your query, just testing each row that it has at least one row from the log table.
select logdateid from logdate where exists
(select 1 from userlogs where userlogs.logdateid=logdate.logdateid);

PostgreSQL 11.5 doing sequential scan for SELECT EXISTS query

I have a multi tenant environment where each tenant (customer) has its own schema to isolate their data. Not ideal I know, but it was a quick port of a legacy system.
Each tenant has a "reading" table, with a composite index of 4 columns:
site_code char(8), location_no int, sensor_no int, reading_dtm timestamptz.
When a new reading is added, a function is called which first checks if there has already been a reading in the last minute (for the same site_code.location_no.sensor_no):
IF EXISTS (
SELECT
FROM reading r
WHERE r.site_code = p_site_code
AND r.location_no = p_location_no
AND r.sensor_no = p_sensor_no
AND r.reading_dtm > p_reading_dtm - INTERVAL '1 minute'
)
THEN
RETURN;
END IF;
Now, bare in mind there are many tenants, all behaving fine except 1. In 1 of the tenants, the call is taking nearly half a second rather than the usual few milliseconds because it is doing a sequential scan on a table with nearly 2 million rows instead of an index scan.
My random_page_cost is set to 1.5.
I could understand a sequential scan if the query was returning possibly many rows, checking for the existance of any.
I've tried ANALYZE on the table, VACUUM FULL, etc but it makes no difference.
If I put "SET LOCAL enable_seqscan = off" before the query, it works perfectly... but it feels wrong, but it will have to be a temporary solution as this is a live system and it needs to work.
What else can I do to help Postgres make what is clearly the better decision of using the index?
EDIT: If I do a similar query manually (outside of a function) it chooses an index.
My guess is that the engine is evaluating the predicate and considers is not selective enough (thinks too many rows will be returned), so decides to use a table scan instead.
I would do two things:
Make sure you have the correct index in place:
create index ix1 on reading (site_code, location_no,
sensor_no, reading_dtm);
Trick the optimizer by making the selectivity look better. You can do that by adding the extra [redundant] predicate and r.reading_dtm < :p_reading_dtm:
select 1
from reading r
where r.site_code = :p_site_code
and r.location_no = :p_location_no
and r.sensor_no = :p_sensor_no
and r.reading_dtm > :p_reading_dtm - interval '1 minute'
and r.reading_dtm < :p_reading_dtm

PostgreSQL performance tuning with table partitions

I am solving an performance issue on PostgreSQL 9.6 dbo based system. Intro:
12yo system, similar to banking system, with most queried primary table called transactions.
CREATE TABLE jrn.transactions (
ID BIGSERIAL,
type_id VARCHAR(200),
account_id INT NOT NULL,
date_issued DATE,
date_accounted DATE,
amount NUMERIC,
..
)
In the table transactions we store all transactions within a bank account. Field type_id determines the type of a transaction. Servers also as C# EntityFramework Discriminator column. Values are like:
card_payment, cash_withdrawl, cash_in, ...
14 types of transaction are known.
In generally, there are 4 types of queries (no. 3 and .4 are by far most frequent):
select single transaction like: SELECT * FROM jrn.transactions WHERE id = 3748734
select single transaction with JOIN to other transaction like: SELECT * FROM jrn.transactions AS m INNER JOIN jrn.transactions AS r ON m.refund_id = r.id WHERE m.id = 3748734
select 0-100, 100-200, .. transactions of given type like: SELECT * FROM jrn.transactions WHERE account_id = 43784 AND type_id = 'card_payment' LIMIT 100
several aggregate queries, like: SELECT SUM(amount), MIN(date_issued), MAX(date_issued) FROM jrn.transactions WHERE account_id = 3748734 AND date_issued >= '2017-01-01'
In last few month we had unexpected row count growth, now 120M.
We are thinking of table partitioning, following to PostgreSQL doc: https://www.postgresql.org/docs/10/static/ddl-partitioning.html
Options:
partition table by type_id into 14 partitions
add column year and partition table by year (or year_month) into 12 (or 144) partitions.
I am now restoring data into out test environment, I am going to test both options.
What do you consider the most appropriate partitioning rule for such situation? Any other options?
Thanks for any feedback / advice etc.
Partitioning won't be very helpful with these queries, since they won't perform a sequential scan, unless you forgot an index.
The only good reason I see for partitioning would be if you want to delete old rows efficiently; then partitioning by date would be best.
Based on your queries, you should have these indexes (apart from the primary key index):
CREATE INDEX ON jrn.transactions (account_id, date_issued);
CREATE INDEX ON jrn.transactions (refund_id);
The following index might be a good idea if you can sacrifice some insert performance to make the third query as fast as possible (you might want to test):
CREATE INDEX ON jrn.transactions (account_id, type_id);
What you have here is almost a perfect case for column-based storage as you may get it using a SAP HANA Database. However, as you explicitly have asked for a Postgres answer and I doubt that a HANA database will be within the budget limit, we will have to stick with Postgres.
Your two queries no. 3 and 4 go quite into different directions, so there won't be "the single answer" to your problem - you will always have to balance somehow between these two use cases. Yet, I would try to use two different techniques to approach each of them individually.
From my perspective, the biggest problem is the query no. 4, which creates quite a high load on your postgres server just because it is summing up values. Moreover, you are just summing up values over and over again, which most likely won't change often (or even at all), as you have said that UPDATEs nearly do not happen at all. I furthermore assume two more things:
transactions is INSERT-only, i.e. DELETE statements almost never happen (besides perhaps in cases of some exceptional administrative intervention).
The values of column date_issued when INSERTing typically are somewhere "close to today" - so you usually won't INSERT stuff way in the past.
Out of this, to prevent aggregating values over and over again unnecessarily, I would introduce yet another table: let's call it transactions_aggr, which is built up like this:
create table transactions_aggr (
account_id INT NOT NULL,
date_issued DATE,
sumamount NUMERIC,
primary key (account_id, date_issued)
)
which will give you a table of per-day preaggregated values.
To determine which values are already preaggregated, I would add another boolean-typed column to transactions, which indicates to me, which of the rows are contained in transactions_aggr and which are not (yet). The query no. 4 then would have to be changed in such a way that it reads only non-preaggregated rows from transactions, whilst the rest could come from transactions_aggr. To facilitate that you could define a view like this:
select account_id, date_issued, sum(amount) as sumamount from
(
select account_id, date_issued, sumamount as amount from transactions_aggr as aggr
union all
select account_id, date_issued, sum(amount) as amount from transactions as t where t.aggregated = false
)
group by account_id, date_issued
Needless to say that putting an index on transactions.aggregated (perhaps in conjunction with the account_id) could greatly help to improve the performance here.
Updating transactions_aggr can be done using multiple approaches:
You could use this as a one-time activity and only pre-aggregate the current set of ~120m rows once. This would at least reduce the load on your machine doing aggregations significantly. However, over time you will run into the same problem again. Then you may just re-execute the entire procedure, simply dropping transactions_aggr as a whole and re-create it from scratch (all the original data still is there in transactions).
You have a nice period somewhere during the week/month/in the night, where you have little or no queries are coming in. Then you can open a transaction, read all transactions WHERE aggregated = false and add them with UPDATEs to transactions_aggr. Keep in mind to then toggle aggregated to true (should be done in the same transaction). The tricky part of this, however, is that you must pay attention to what reading queries will "see" of this transaction: Depending on your requirements of accuracy during that timeframe of this "update job", you may have to consider switching the transaction isolation level to "READ_COMMITED" to prevent ghost reads.
On the matter of your query no. 3 you then could try to really go for the approach of partitioning based on type_id. However, I perceive your query as a little strange, as you are performing a LIMIT/OFFSET without ordering (e.g. there is no ORDER BY statement in place) having specified (NB: You are not saying that you would be using database cursors). This may lead to the effect that the implicit order, which is currently used, is changed, if you enable partitioning on the table. So be careful on side-effects which this may cause on your program.
And one more thing: Before really doing the partition split, I would first check on the data distribution concerning type_id by issuing
select type_id, count(*) from transactions group by type_id
Not that it turns out that, for example, 90% of your data is with card_payment - so that you will have a heavily uneven distribution amongst your partitions and the biggest performance hogging queries are those which would still go into this single "large partition".
Hope this helps a little - and good luck!

First call of query on big table is surprisingly slow

I have a query that feels like it is taking more time then it should be. This only applies on the first query for a given set of parameters, so when cached there is no issue.
I am not sure what to expect, however, given the setup and settings I was hoping someone could shed some light on a few questions and give some insight into what can be done to speed up the query. The table in question is fairly large and Postgres estimates around 155963000 in it (14 GB).
Query
select ts, sum(amp) as total_amp, sum(230 * factor) as wh
from data_cbm_aggregation_15_min
where virtual_id in (1818) and ts between '2015-02-01 00:00:00' and '2015-03-31 23:59:59'
and deleted is null
group by ts
order by ts
When I started looking into this the query it took around 15 seconds, after some changes I have gotten it to around 10 seconds which still seems long for a simply query like this. Here are the results from explain analyze: http://explain.depesz.com/s/97V1. Note the reason why GroupAggregate returns the same amount of rows is this example only has one virtual_id being used, but there can be more.
Table and index
Table being queried, it has values inserted into it every 15 minutes
CREATE TABLE data_cbm_aggregation_15_min (
virtual_id integer NOT NULL,
ts timestamp without time zone NOT NULL,
amp real,
recs smallint,
min_amp real,
max_amp real,
deleted boolean,
factor real DEFAULT 0.25,
min_amp_ts timestamp without time zone,
max_amp_ts timestamp without time zone
)
ALTER TABLE data_cbm_aggregation_15_min ALTER COLUMN virtual_id SET STATISTICS 1000;
ALTER TABLE data_cbm_aggregation_15_min ALTER COLUMN ts SET STATISTICS 1000;
The index that is used in the query
CREATE UNIQUE INDEX idx_data_cbm_aggregation_15_min_virtual_id_ts
ON data_cbm_aggregation_15_min USING btree (virtual_id, ts DESC);
ALTER TABLE data_cbm_aggregation_15_min
CLUSTER ON idx_data_cbm_aggregation_15_min_virtual_id_ts;
Postgres settings
Other settings are default.
default_statistics_target = 100
maintenance_work_mem = 2GB
effective_cache_size = 11GB
work_mem = 256MB
shared_buffers = 3840MB
random_page_cost = 1
What I have tried
I have been following the Things to try before you post in https://wiki.postgresql.org/wiki/Slow_Query_Questions and the results in a bit more detail were as follows:
Fiddling with the Postgres settings, mostly lowering random_page_cost since the index scan, while it seems not too special is miles ahead of the bitmap heap scan it tried doing instead when the random_page_cost was higher.
Adding increased statistics to the virtual_id and ts columns which the index and WHERE conditions are based on. The query planner's estimated row count was much closer to the actual row count after changing this.
Clustering on the idx_data_cbm_aggregation_15_min_virtual_id_ts index did not seem to change much, not that I noticed.
Running VACUUM manually did not change much, I am already running autovacuum so this was no surprise.
Running REINDEX on the index shrunk it considerably (by almost 50%!) but it did not improve the speed by much.
A couple of small improvements
SELECT ts, sum(amp) AS total_amp, sum(factor) * 230 AS wh
FROM data_cbm_aggregation_15_min
WHERE virtual_id = 1818
AND ts >= '2015-02-01 00:00'
AND ts < '2015-04-01 00:00'
AND deleted IS NULL
GROUP BY ts
ORDER BY ts;
sum(230 * factor) - it's cheaper to multiply the sum once instead of multiplying each element: sum(factor) * 230 The result is the same, even with NULL values.
ts between '2015-02-01 00:00:00' and '2015-03-31 23:59:59' is potentially incorrect. To include all of March 2015, use the presented alternative. BETWEEN is translated to ts >= lower AND ts <= upper anyway. It is always slightly faster to spell it out.
virtual_id in (1818) is just a needlessly convoluted way to say virtual_id = 1818.
Better index, potentially bigger improvement
CREATE INDEX data_cbm_aggregation_15_min_special_idx
ON data_cbm_aggregation_15_min (virtual_id, ts, amp, factor)
WHERE deleted IS NULL;
I see nothing in your question that would suggest DESC in your original index. While Index Scan Backward is almost as fast as a plain Index Scan, it's still better to drop the modifier.
Most importantly, there are index-only scans since Postgres 9.2. The two index columns I appended (amp, factor) only make sense if you get index-only scans out of it.
Since you obviously are not interested in deleted rows, make it a partial index. Only pays if you have more than a few deleted rows in the table.
If you have other large parts of the table that can be excluded, add more conditions - and remember to repeat the condition in the query (even if it seems redundant) so Postgres understands that the index is applicable.
Table definition
Reordering table columns like this would save 8 bytes per row:
CREATE TABLE data_cbm_aggregation_15_min (
virtual_id integer NOT NULL,
recs smallint,
deleted boolean,
ts timestamp NOT NULL,
amp real,
min_amp real,
max_amp real,
factor real DEFAULT 0.25,
min_amp_ts timestamp,
max_amp_ts timestamp
);
Related:
Configuring PostgreSQL for read performance
Most important information for last
The first query call can be substantially more expensive for very big tables, since the whole table cannot be cached. Subsequent calls profit from the populated cache. Postgres caches blocks, not necessarily whole tables.
One more thing that can be important for the first call. Due to the MVCC model of Postgres it has to maintain visibility information. When reading pages of a table the first time since the last write operation, Postgres opportunistically updates visibility information, which can impose some extra cost for the first access (and help a lot for subsequent calls). More in the manual here. Related answer on dba.SE:
Why does a SELECT statement dirty cache buffers in Postgres?
About what you've tried so far
SET STATISTICS 1000 for ts and virtual_id was an excellent idea, but the effect was largely nullified by setting random_page_cost = 1, which basically forces an index scan for this query either way.
random_page_cost = 1 is telling Postgres that random access is just as cheap as sequential access. This makes sense for a DB that (almost) completely resides in cache. For a DB with huge tables like yours, this setting seems too extreme (even if it gets Postgres to favor the desired index scan). Set it to random_page_cost = 1.1 or probably higher.
A bitmap index scan is typically a good plan for the first call of the query you presented - for data distributed randomly across the table. Since you clustered the table just like you need it for this query, an index scan is more efficient. The question is: will your table stay clustered?
Your settings for work_mem and other resources depend on how much RAM you have, the speed of your disks, on access pattern, how many concurrent connections you typically have, what other programs on the server compete for resources, etc. work_mem = 256MB seems too high. You don't need nearly as much for the presented query. Setting it that high may actually harm performance, because it reduces RAM available to cache.
REINDEX is not redundant immediately after CLUSTER, since that recreates all indexes anyway. You must have run REINDEX before cluster, or you have heavy write access on the table to get so much bloat again already.
Various
Upgrade to Postgres 9.4 (or the upcoming 9.5, currently alpha). Version 9.2 is 3 years old now, the latest version has received many improvements.
The query plan suggests that nothing is actually aggregated. rows=4,117 are read from the index and rows=4,117 remain after GroupAggregate. Looks like rows are unique on ts already? Then you can remove the aggregation completely and make it a simple SELECT ...
If that's just a misleading EXPLAIN output and you typically output much fewer rows than are read, a MATERIALIZED VIEW with index on ts would be another option. Especially in combination with Postgres 9.4, which introduces REFRESH MATERIALIZED VIEW CONCURRENTLY.

How does this PostgreSQL query slow down when the number of rows increases?

I have a table briefly structured like this:
tn( id integer NOT NULL primary key DEFAULT nextval('tn_sequence'),
create_dt TIMESTAMP NOT NULL DEFAULT NOW(),
...............
deleted boolean );
create_dt is the timestamp when the row is inserted into the database.
deleted indicates that the row is or no longer useful.
And I have the following queries:
select * from tn where create_dt > ( NOW() - interval '150 seconds ) and deleted = FALSE;
select * from tn where create_dt < ( NOW() - interval '150 seconds ) and deleted = FALSE;
My question is how these query will slow down when the number of rows increase? For instance, when the number of rows exceeds 10K, 20K, or 100K, will it make a big impact on the speed? Is there any way I can optimize these queries? Note that every 5 seconds I will turn the column 'deleted' of rows which are older than 150 seconds into 'TRUE'.
The effect of table growth on performance will depend on the query plan chosen, available indexes, the selectivity of the query, and lots of other factors. EXPLAIN ANALYZE on the query might help. In short, if your query only selects a few rows and can use a simple b-tree index then it won't usually slow down tons, only a little as the index grows. On the other hand queries using complex non-indexed conditions or returning lots of rows could perform very badly indeed.
Your issue appears to mirror that in the question How should we handle rows which won't be queried once they are old in PostgreSQL?
The advice given there should apply:
Use a partial index with the condition WHERE (not deleted); or
partition on 'deleted' with constraint exclusion enabled.
For example, you might:
CREATE INDEX create_dt_when_not_deleted_idx
ON tn (create_dt)
WHERE (NOT deleted);
This includes only rows where deleted = 'f' (assuming deleted is `not null) in the index. This isn't the same as having them gone from the table completely.
Nothing changes with full table sequential scans, the deleted='t' rows must still be scanned; and
There's more I/O than if the deleted = 't' rows weren't there because any given heap page is likely to contain a mix of deleted = 't' and deleted = 'f' rows.
You can reduce the impact of the latter by CLUSTERing on an index that includes deleted. Again, this will have no effect on sequential scans. To help with sequential scans you would have to partition the table on deleted.
Pg 9.2's index only scans should (I think, haven't tested) use the partial index. When an index only scan is possible the partial index should be as fast as an index on a table containing only the deleted = 'f' rows.
Note that you'll need to keep table and index bloat under control. Ensure autovaccum runs very frequently and use a current version of PostgreSQL that doesn't need things like manually-managed free space map and has the latest, best-behaved autovacuum. I'd recommend 9.0 or above, preferably 9.1 or 9.2. Tune autovacuum to run aggressively.
When tuning and testing performance - test your queries with EXPLAIN ANALYZE, don't just guess.