Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql - postgresql

This is my table schema
Column | Type | Modifiers
-------------+------------------------+------------------------------------------------------
id | integer | not null default nextval('message_id_seq'::regclass)
date_created | bigint |
content | text |
user_name | character varying(128) |
user_id | character varying(128) |
user_type | character varying(8) |
user_ip | character varying(128) |
user_avatar | character varying(128) |
chatbox_id | integer | not null
Indexes:
"message_pkey" PRIMARY KEY, btree (id)
"idx_message_chatbox_id" btree (chatbox_id)
"indx_date_created" btree (date_created)
Foreign-key constraints:
"message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE
This is the query
SELECT *
FROM message
WHERE chatbox_id=$1
ORDER BY date_created
OFFSET 0
LIMIT 20;
($1 will be replaced by the actual ID)
It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records
I am using Postresql Server 9.1.5 with default options.
Update the output of EXPLAIN ANALYZE
Limit (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
-> Index Scan Backward using indx_date_created on message (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)
Update server specification
Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB
Final solution
Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested

You could try to give PostgreSQL a better index for that query. I propose something like this:
create index invent_suitable_name on message(chatbox_id, date_created);
or
create index invent_suitable_name on message(chatbox_id, date_created desc);

Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.
For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).
UPD My guess for the reason of bad performance:
At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:
Index scan on chatbox_id;
Ordering of returned records to get top 20.
Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.
UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?

Related

How to make big postgres db faster?

I have big Postgres database(around 75 GB) and queries are very slow. Is there any way to make them faster?
About database:
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------------------+----------+----------+-------------+---------------+------------+-------------
public | fingerprints | table | postgres | permanent | heap | 35 GB |
public | songs | table | postgres | permanent | heap | 26 MB |
public | songs_song_id_seq | sequence | postgres | permanent | | 8192 bytes |
\d+ fingerprints
Table "public.fingerprints"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
hash | bytea | | not null | | extended | | |
song_id | integer | | not null | | plain | | |
offset | integer | | not null | | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"ix_fingerprints_hash" hash (hash)
"uq_fingerprints" UNIQUE CONSTRAINT, btree (song_id, "offset", hash)
Foreign-key constraints:
"fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
DB Scheme
DB Amount
No need to write to database, only read. All queries are very simple:
SELECT song_id
WHERE hash in fingerpints = X
EXPLAIN(analyze, buffers, format text) SELECT "song_id", "offset" FROM "fingerprints" WHERE "hash" = decode('eeafdd7ce9130f9697','hex');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_fingerprints_hash on fingerprints (cost=0.00..288.28 rows=256 width=8) (actual time=0.553..234.257 rows=871 loops=1)
Index Cond: (hash = '\xeeafdd7ce9130f9697'::bytea)
Buffers: shared hit=118 read=749
Planning Time: 0.225 ms
Execution Time: 234.463 ms
(5 rows)
234 ms looks fine where it is one query. But in reality there 3000 query per time, that takes about 600 seconds. It is audio recognition application, so algoritm works like that.
About indexes:
CREATE INDEX "ix_fingerprints_hash" ON "fingerprints" USING hash ("hash");
For pooler I use Odyssey.
Little bit of info from config:
shared_buffers = 4GB
huge_pages = try
work_mem = 582kB
maintenance_work_mem = 2GB
effective_io_concurrency = 200
max_worker_processes = 24
max_parallel_workers_per_gather = 12
max_parallel_maintenance_workers = 4
max_parallel_workers = 24
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 16GB
min_wal_size = 4GB
random_page_cost = 1.1
effective_cache_size = 12GB
Info about hardware:
Xeon 12 core (24 threads)
RAM DDR4 16 GB ECC
NVME disk
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)? And what parameters should I change to say to Postgres to store db in ram?
I read about several topics about pg_tune, etc. but experiments don't show any good results.
Increasing the RAM so that everything can stay in cache (perhaps after using pg_prewarm to get it into cache in the first place) would certainly work. But it is expensive and shouldn't be necessary.
Having a hash index on something which is already a hashed value is probably not very helpful. Have you tried just a default (btree) index instead?
If you CLUSTER the table on the index over the column named "hash" (which you can only do if it is a btree index) then rows with the same hash code should mostly share the same table page, which would greatly cut down on the number of different buffer reads needed to fetch them all.
If you could get it do a bitmap heap scan instead of an index scan, then it should be able to have a large number of read requests outstanding at a time, due to effective_io_concurrency. But the planner does not account for effective_io_concurrency when doing planning, which means it won't choose a bitmap heap scan specifically to get it that benefit. Normally an index read finding hundreds of rows on different pages would automatically choose a bitmap heap scan method, but in your case it is probably the low setting of random_page_cost which is inhibiting it from doing so. The low setting of random_page_cost is probably reasonable in itself, but it does have this unfortunate side effect. A problem with this strategy is that it doesn't reduce the overall amount of IO needed, it just allows them overlap and so make better use of multiple IO channels. But if many sessions are running many instances of this query at the same time, they will start filling up those channels and so start competing with each other. So the CLUSTER method is probably superior as it gets the same answer with less IO. If you want to play around with bitmap scans, you could temporarily increase random_page_cost or temporarily set enable_indexscan to off.
No need to write to database, only read.
So the DB is read-only.
And in comments:
db worked fine on small amount of data(few GB), but after i filled out database started to slowdown.
So indexes have been built up incrementally.
Indexes
UNIQUE CONSTRAINT on (song_id, "offset", hash)
I would replace that with:
ALTER TABLE fingerprints
DROP CONSTRAINT uq_fingerprints
, ADD CONSTRAINT uq_fingerprints UNIQUE(hash, song_id, "offset") WITH (FILLFACTOR = 100)
This enforces the same constraint, but the leading hash column in the underlying B-tree index now supports the filter on hash in your displayed query. And the fact that all needed columns are included in the index, further allows much faster index-only scans. The (smaller) index should also be more easily cached than the (bigger) table (plus index).
See:
Is a composite index also good for queries on the first field?
Also rewrites the index in pristine condition, and with FILLFACTOR 100 for the read-only DB. (Instead of the default 90 for a B-tree index.)
Hash index on (hash) and CLUSTER
The name of the column "hash" has nothing to do with the name of the index type, which also happens to be "hash". (The column should probably not be named "hash" to begin with.)
If (and only if) you also have other queries centered around one of few hash values, that cannot use index-only scans (and you actually see faster queries than without) keep the hash index, additionally. And optimize it. (Else drop it!)
ALTER INDEX ix_fingerprints_hash SET (FILLFACTOR = 100);
An incrementally grown index may end up with bloat or unbalanced overflow pages in case of a hash index. REINDEX should take care of that. While being at it, increase FILLFACTER to 100 (from the default 75 for a hash index) for your read-only (!) DB. You can REINDEX to make the change effective.
REINDEX INDEX ix_fingerprints_hash;
Or you can CLUSTER (like jjanes already suggested) on the rearranged B-tree index from above:
CLUSTER fingerprints USING uq_fingerprints;
Rewrites the table and all indexes; rows are physically sorted according to the given index, so "clustered" around the leading column(s). Effects are permanent for your read-only DB. But index-only scans do not benefit from this.
When done optimizing, run once:
VACUUM ANALYZE fingerprints;
work_mem
The tiny setting for work_mem stands out:
work_mem = 582kB
Even the (very conservative!) default is 4MB.
But after reading your question again, it would seem you only have tiny queries. So maybe that's ok after all.
Else, with 16GB RAM you can typically afford a 100 times as much. Depends on your work load of course.
Many small queries, many parallel workers --> keep small work_mem (like 4MB?)
Few big queries, few parallel workers --> go high (like 256MB? or more)
Large amounts of temporary files written in your database over time, and mentions of "disk" in the output of EXPLAIN ANALYZE would indicate the need for more work_mem.
Additional questions
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)?
More RAM almost always helps until the whole DB can be cached in RAM and all processes can afford all the work_mem they desire.
And what parameters should I change to say to Postgres to store db in ram?
Everything that's read from the database is cached automatically in system cache and Postgres cache, up to the limit of available RAM. (Setting work_mem too high competes for that same resource.)

Can Postgres table partitioning by hash boost query performance?

I have an OLTP table on a Postgres 14.2 database that looks something like this:
Column | Type | Nullable |
----------------+-----------------------------+-----------
id | character varying(32) | not null |
user_id | character varying(255) | not null |
slug | character varying(255) | not null |
created_at | timestamp without time zone | not null |
updated_at | timestamp without time zone | not null |
Indexes:
"playground_pkey" PRIMARY KEY, btree (id)
"playground_user_id_idx" btree (user_id)
The database host has 8GB of RAM and 2 CPUs.
I have roughly 500M records in the table which adds up to about 80GB in size.
The table gets about 10K INSERT/h, 30K SELECT/h, and 5K DELETE/h.
The main query run against the table is:
SELECT * FROM playground WHERE user_id = '12345678' and slug = 'some-slug' limit 1;
Users have anywhere between 1 record to a few hundred records.
Thanks to the index on the user_id I generally get decent performance (double-digit milliseconds), but 5%-10% of the queries will take a few hundred milliseconds to maybe a second or two at worst.
My question is this: would partitioning the table by hash(user_id) help me boost lookup performance by taking advantage of partition pruning?
No, that wouldn't improve the speed of the query at all, since there is an index on that attribute. If anything, the increased planning time will slow down the query.
If you want to speed up that query as much as possible, create an index that supports both conditions:
CREATE INDEX ON playground (user_id, slug);
If slug is large, it may be preferable to index a hash:
CREATE INDEX ON playground (user_id, hashtext(slug));
and query like this:
SELECT *
FROM playground
WHERE user_id = '12345678'
AND slug = 'some-slug'
AND hashtext(slug) = hashtext('some-slug');
LIMIT 1;
Of course, partitioning could be a good idea for other reasons, for example to speed up autovacuum or CREATE INDEX.

Postgres: How to efficiently bucketize random event ids below (hour,config_id,sensor_id)

I have a large table "measurement" with 4 columns:
measurement-service=> \d measurement
Table "public.measurement"
Column | Type | Collation | Nullable | Default
-----------------------+-----------------------------+-----------+----------+---------
hour | timestamp without time zone | | not null |
config_id | bigint | | not null |
sensor_id | bigint | | not null |
event_id | uuid | | not null |
Partition key: RANGE (hour)
Indexes:
"hour_config_id_sensor_id_event_id_key" UNIQUE CONSTRAINT, btree (hour, config_id, sensor_id, event_id)
Number of partitions: 137 (Use \d+ to list them.)
An example of a partition name: "measurement_y2019m12d04"
And then i insert a lot of events as CSV via COPY to a temporary table, and from there i copy the table directly into the partition using ON CONFLICT DO NOTHING.
Example:
CREATE TEMPORARY TABLE 'tmp_measurement_y2019m12d04T02_12345' (
hour timestamp without timezone,
config_id bigint,
sensor_id bigint,
event_id uuid
) ON COMMIT DROP;
[...]
COPY tmp_measurement_y2019m12d04T02_12345 FROM STDIN DELIMITER ',' CSV HEADER;
INSERT INTO measurement_y2019m12d04 (SELECT * FROM tmp_measurement_y2019m12d04T02_12345) ON CONFLICT DO NOTHING;
I think i help postgres by sending CSV with data of the same hour only. Also within that hour, i remove all duplicates in the CSV. Therefore the CSV only contains unique rows.
But i send many batches for different hours. There is no order. It can be the hour of today, yesterday, the last week. Etc.
My approach worked alright so far, but i think i have reached a limit now. The insertion speed has become very slow. While the CPU is idle, i have 25% i/o wait. Subsystem is a RAID with several TB, using disks, that are not SSD.
maintenance_work_mem = 32GB
max_wal_size = 1GB
fsync = off
max_worker_processes = 256
wal_buffers = -1
shared_buffers = 64GB
temp_buffers = 4GB
effective_io_concurrency = 1000
effective_cache_size = 128GB
Each partition per day is around 20gb big and contains no more than 500m rows. And by maintaining the unique index per partition, i just duplicated the data once more.
The lookup speed, on the other hand, is quick.
I think the limit is in the maintenance of the btree with the rather random UUIDs in (hour,config_id,sensor_id). I constantly change it, its written out and has to be re-read.
I am wondering, if there is another approach. Basically i want uniqueness for (hour,config_id,sensor_id,event_id) and then a quick lookup per (hour,config_id,sensor_id).
I am considering removal of the unique index and only having an index over (hour,config_id,sensor_id). And then providing the uniqueness on the reader side. But it may slow down the reading, as the event_id can no longer be delivered via the index, when i lookup via (hour,config_id,sensor_id). It has to access the actual row to get the event_id.
Or i provide uniqueness via a hash index.
Any other ideas are welcome!
Thank you.
When you do the insert, you should specify an ORDER BY which matches the index of the table being inserted into:
INSERT INTO measurement_y2019m12d04
SELECT * FROM tmp_measurement_y2019m12d04T02_12345
order by hour, config_id, sensor_id, event_id
Only if this fails to give enough improvement would I consider any of the other options you list.
Hash indexes don't provide uniqueness. You can simulate it with an exclusion constraint, but I think they are less efficient. Exclusion constraints do support DO NOTHING, but not support DO UPDATE. So as long as your use case does not evolve to want DO UPDATE, you would be good on that front, but I still doubt it would actually solve the problem. If your bottleneck is IO from updating the index, hash would only make it worse as it is designed to scatter your data all over the place, rather than focus it in a small cacheable area.
You also mention parallel processing. For inserting into the temp table, that might be fine. But I wouldn't do the INSERT...SELECT in parallel. If IO is your bottleneck, that would probably just make it worse. Of course if IO is no longer the bottleneck after my ORDER BY suggestion, then ignore this part.

When do queries with very large "WHERE x IN" clauses perform badly?

I have always been under the impression that queries with large WHERE x IN clauses perform badly.
For example, let's say I have the following table
psql> \d examples
| Column | Type | Modifiers
| id | integer | not null default nextval('id_seq'::regclass)
| flag | boolean | not null
| date | timestamp | not null
| comment | character varying(16) | not null
Indexes:
"example_pkey" PRIMARY KEY, btree (id)
If I query this table using a WHERE id IN clause with 100k ids it actually returns fairly quickly (there are 100ks rows in the table):
EXPLAIN ANALYSE SELECT * FROM examples WHERE id IN (1, 2, ...., 100000);
Index Scan using examples_pkey on examples (cost=0.15..15012.60 rows=610 width=104) (actual time=18.380..18.380 rows=0 loops=1)
Index Cond: (id = ANY ('{1, 2, ... 100000}'::integer[]))
Planning time: 45.832 ms
Execution time: 42.708 ms
The planner has rewritten the query to use id = ANY with my list of IDs so I assume this is just equivalent or the planner knows that's faster.
I have had other queries before where replacing large WHERE IN clauses with a subquery has greatly improved the speed of the query. My query here is very simple, so the question is:
Is it true that queries with large WHERE IN clauses actually perform badly? If so in which cases is that the case (assuming columns are indexes etc).
The parser transforms IN expressions with more than one element in the list into =ANY expressions.
If there is a B-tree index on the column, it can always be used.
The only performance problem I can imagine is that the optimizer may estimate the number of result rows wrongly and choose a sequential scan when an index scan would actually be faster.
Experiments show that the optimizer estimates one result row per list entry for unique columns, with a maximum of the number of rows in the table. This is a reasonable estimate, but if there are many elements in the list that do not match or there are duplicates, the estimate can easily be too high.

optimize a postgres query that updates a big table [duplicate]

I have two huge tables:
Table "public.tx_input1_new" (100,000,000 rows)
Column | Type | Modifiers
----------------|-----------------------------|----------
blk_hash | character varying(500) |
blk_time | timestamp without time zone |
tx_hash | character varying(500) |
input_tx_hash | character varying(100) |
input_tx_index | smallint |
input_addr | character varying(500) |
input_val | numeric |
Indexes:
"tx_input1_new_h" btree (input_tx_hash, input_tx_index)
Table "public.tx_output1_new" (100,000,000 rows)
Column | Type | Modifiers
--------------+------------------------+-----------
tx_hash | character varying(100) |
output_addr | character varying(500) |
output_index | smallint |
input_val | numeric |
Indexes:
"tx_output1_new_h" btree (tx_hash, output_index)
I want to update table1 by the other table:
UPDATE tx_input1 as i
SET
input_addr = o.output_addr,
input_val = o.output_val
FROM tx_output1 as o
WHERE
i.input_tx_hash = o.tx_hash
AND i.input_tx_index = o.output_index;
Before I execute this SQL command, I already created the index for this two table:
CREATE INDEX tx_input1_new_h ON tx_input1_new (input_tx_hash, input_tx_index);
CREATE INDEX tx_output1_new_h ON tx_output1_new (tx_hash, output_index);
I use EXPLAIN command to see the query plan, but it didn't use the index I created.
It took about 14-15 hours to complete this UPDATE.
What is the problem within it?
How can I shorten the execution time, or tune my database/table?
Thank you.
Since you are joining two large tables and there are no conditions that could filter out rows, the only efficient join strategy will be a hash join, and no index can help with that.
First there will be a sequential scan of one of the tables, from which a hash structure is built, then there will be a sequential scan over the other table, and the hash will be probed for each row found. How could any index help with that?
You can expect such an operation to take a long time, but there are some ways in which you could speed up the operation:
Remove all indexes and constraints on tx_input1 before you begin. Your query is one of the examples where an index does not help at all, but actually hurts performance, because the indexes have to be updated along with the table. Recreate the indexes and constraints after you are done with the UPDATE. Depending on the number of indexes on the table, you can expect a decent to massive performance gain.
Increase the work_mem parameter for this one operation with the SET command as high as you can. The more memory the hash operation can use, the faster it will be. With a table that big you'll probably still end up having temporary files, but you can still expect a decent performance gain.
Increase checkpoint_segments (or max_wal_size from version 9.6 on) to a high value so that there are fewer checkpoints during the UPDATE operation.
Make sure that the table statistics on both tables are accurate, so that PostgreSQL can come up with a good estimate for the number of hash buckets to create.
After the UPDATE, if it affects a big number of rows, you might consider to run VACUUM (FULL) on tx_input1 to get rid of the resulting table bloat. This will lock the table for a longer time, so do it during a maintenance window. It will reduce the size of the table and as a consequence speed up sequential scans.