Postgres returns records in wrong order - postgresql

I have pretty big table messages. It contains about 100mil records.
When I run simple query:
select uid from messages order by uid asc limit 1000
The result is very strange. Records from the beginning are ok, but then they are not always ordered by column uid.
uid
----------
94621458
94637590
94653611
96545014
96553145
96581957
96590621
102907437
.....
446131576
459475933
424507749
424507166
459474125
431059132
440517049
446131301
475651666
413687676
.....
Here is analyze
Limit (cost=0.00..3740.51 rows=1000 width=4) (actual time=0.009..4.630 rows=1000 loops=1)
Output: uid
-> Index Scan using messages_pkey on public.messages (cost=0.00..376250123.91 rows=100587944 width=4) (actual time=0.009..4.150 rows=1000 loops=1)
Output: uid
Total runtime: 4.796 ms
PostgreSQL 9.1.12
The table is always under high load(inserts, updates, deletes) and almost constantly autovacuuming. May that cause the problem?
UPD. Added table definition. Sorry cannot add full table definition, but all impotant fields and their types are here.
# \d+ messages
Table "public.messages"
Column | Type | Modifiers | Storage | Description
--------------+-----------------------------+--------------------------------------------------------+----------+-------------
uid | integer | not null default nextval('messages_uid_seq'::regclass) | plain |
code | character(22) | not null | extended |
created | timestamp without time zone | not null | plain |
accountid | integer | not null | plain |
status | character varying(20) | not null | extended |
hash | character(3) | not null | extended |
xxxxxxxx | timestamp without time zone | not null | plain |
xxxxxxxx | integer | | plain |
xxxxxxxx | character varying(255) | | extended |
xxxxxxxx | text | not null | extended |
xxxxxxxx | character varying(250) | not null | extended |
xxxxxxxx | text | | extended |
xxxxxxxx | text | not null | extended |
Indexes:
"messages_pkey" PRIMARY KEY, btree (uid)
"messages_unique_code" UNIQUE CONSTRAINT, btree (code)
"messages_accountid_created_idx" btree (accountid, created)
"messages_accountid_hash_idx" btree (accountid, hash)
"messages_accountid_status_idx" btree (accountid, status)
Has OIDs: no

Here's a very general answer:
Try :
SET enable_indexscan TO off;
Rerun the query in the same session.
If the order of the results is different than with enable_indexscan to on, then the index is corrupted.
In this case, fix it with:
REINDEX INDEX index_name;

Save yourself the long wait trying to run the query without index. The problem is most probably due to a corrupted index. Repair it right away and see if that fixes the problem.
Since your table is always under high load, consider building a new index concurrently. Takes a bit longer, but does not block concurrent writes. Per documentation on REINDEX:
To build the index without interfering with production you should drop
the index and reissue the CREATE INDEX CONCURRENTLY command.
And under CREATE INDEX:
CONCURRENTLY
When this option is used, PostgreSQL will build the index without
taking any locks that prevent concurrent inserts, updates, or deletes
on the table; whereas a standard index build locks out writes (but not
reads) on the table until it's done. There are several caveats to be
aware of when using this option — see Building Indexes Concurrently.
So I suggest:
ALTER TABLE news DROP CONSTRAINT messages_pkey;
CREATE UNIQUE INDEX CONCURRENTLY messages_pkey ON news (uid);
ALTER TABLE news ADD PRIMARY KEY USING INDEX messages_pkey;
The last step is just a tiny update to the system catalogs.

Related

How to make big postgres db faster?

I have big Postgres database(around 75 GB) and queries are very slow. Is there any way to make them faster?
About database:
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------------------+----------+----------+-------------+---------------+------------+-------------
public | fingerprints | table | postgres | permanent | heap | 35 GB |
public | songs | table | postgres | permanent | heap | 26 MB |
public | songs_song_id_seq | sequence | postgres | permanent | | 8192 bytes |
\d+ fingerprints
Table "public.fingerprints"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
hash | bytea | | not null | | extended | | |
song_id | integer | | not null | | plain | | |
offset | integer | | not null | | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"ix_fingerprints_hash" hash (hash)
"uq_fingerprints" UNIQUE CONSTRAINT, btree (song_id, "offset", hash)
Foreign-key constraints:
"fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
DB Scheme
DB Amount
No need to write to database, only read. All queries are very simple:
SELECT song_id
WHERE hash in fingerpints = X
EXPLAIN(analyze, buffers, format text) SELECT "song_id", "offset" FROM "fingerprints" WHERE "hash" = decode('eeafdd7ce9130f9697','hex');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_fingerprints_hash on fingerprints (cost=0.00..288.28 rows=256 width=8) (actual time=0.553..234.257 rows=871 loops=1)
Index Cond: (hash = '\xeeafdd7ce9130f9697'::bytea)
Buffers: shared hit=118 read=749
Planning Time: 0.225 ms
Execution Time: 234.463 ms
(5 rows)
234 ms looks fine where it is one query. But in reality there 3000 query per time, that takes about 600 seconds. It is audio recognition application, so algoritm works like that.
About indexes:
CREATE INDEX "ix_fingerprints_hash" ON "fingerprints" USING hash ("hash");
For pooler I use Odyssey.
Little bit of info from config:
shared_buffers = 4GB
huge_pages = try
work_mem = 582kB
maintenance_work_mem = 2GB
effective_io_concurrency = 200
max_worker_processes = 24
max_parallel_workers_per_gather = 12
max_parallel_maintenance_workers = 4
max_parallel_workers = 24
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 16GB
min_wal_size = 4GB
random_page_cost = 1.1
effective_cache_size = 12GB
Info about hardware:
Xeon 12 core (24 threads)
RAM DDR4 16 GB ECC
NVME disk
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)? And what parameters should I change to say to Postgres to store db in ram?
I read about several topics about pg_tune, etc. but experiments don't show any good results.
Increasing the RAM so that everything can stay in cache (perhaps after using pg_prewarm to get it into cache in the first place) would certainly work. But it is expensive and shouldn't be necessary.
Having a hash index on something which is already a hashed value is probably not very helpful. Have you tried just a default (btree) index instead?
If you CLUSTER the table on the index over the column named "hash" (which you can only do if it is a btree index) then rows with the same hash code should mostly share the same table page, which would greatly cut down on the number of different buffer reads needed to fetch them all.
If you could get it do a bitmap heap scan instead of an index scan, then it should be able to have a large number of read requests outstanding at a time, due to effective_io_concurrency. But the planner does not account for effective_io_concurrency when doing planning, which means it won't choose a bitmap heap scan specifically to get it that benefit. Normally an index read finding hundreds of rows on different pages would automatically choose a bitmap heap scan method, but in your case it is probably the low setting of random_page_cost which is inhibiting it from doing so. The low setting of random_page_cost is probably reasonable in itself, but it does have this unfortunate side effect. A problem with this strategy is that it doesn't reduce the overall amount of IO needed, it just allows them overlap and so make better use of multiple IO channels. But if many sessions are running many instances of this query at the same time, they will start filling up those channels and so start competing with each other. So the CLUSTER method is probably superior as it gets the same answer with less IO. If you want to play around with bitmap scans, you could temporarily increase random_page_cost or temporarily set enable_indexscan to off.
No need to write to database, only read.
So the DB is read-only.
And in comments:
db worked fine on small amount of data(few GB), but after i filled out database started to slowdown.
So indexes have been built up incrementally.
Indexes
UNIQUE CONSTRAINT on (song_id, "offset", hash)
I would replace that with:
ALTER TABLE fingerprints
DROP CONSTRAINT uq_fingerprints
, ADD CONSTRAINT uq_fingerprints UNIQUE(hash, song_id, "offset") WITH (FILLFACTOR = 100)
This enforces the same constraint, but the leading hash column in the underlying B-tree index now supports the filter on hash in your displayed query. And the fact that all needed columns are included in the index, further allows much faster index-only scans. The (smaller) index should also be more easily cached than the (bigger) table (plus index).
See:
Is a composite index also good for queries on the first field?
Also rewrites the index in pristine condition, and with FILLFACTOR 100 for the read-only DB. (Instead of the default 90 for a B-tree index.)
Hash index on (hash) and CLUSTER
The name of the column "hash" has nothing to do with the name of the index type, which also happens to be "hash". (The column should probably not be named "hash" to begin with.)
If (and only if) you also have other queries centered around one of few hash values, that cannot use index-only scans (and you actually see faster queries than without) keep the hash index, additionally. And optimize it. (Else drop it!)
ALTER INDEX ix_fingerprints_hash SET (FILLFACTOR = 100);
An incrementally grown index may end up with bloat or unbalanced overflow pages in case of a hash index. REINDEX should take care of that. While being at it, increase FILLFACTER to 100 (from the default 75 for a hash index) for your read-only (!) DB. You can REINDEX to make the change effective.
REINDEX INDEX ix_fingerprints_hash;
Or you can CLUSTER (like jjanes already suggested) on the rearranged B-tree index from above:
CLUSTER fingerprints USING uq_fingerprints;
Rewrites the table and all indexes; rows are physically sorted according to the given index, so "clustered" around the leading column(s). Effects are permanent for your read-only DB. But index-only scans do not benefit from this.
When done optimizing, run once:
VACUUM ANALYZE fingerprints;
work_mem
The tiny setting for work_mem stands out:
work_mem = 582kB
Even the (very conservative!) default is 4MB.
But after reading your question again, it would seem you only have tiny queries. So maybe that's ok after all.
Else, with 16GB RAM you can typically afford a 100 times as much. Depends on your work load of course.
Many small queries, many parallel workers --> keep small work_mem (like 4MB?)
Few big queries, few parallel workers --> go high (like 256MB? or more)
Large amounts of temporary files written in your database over time, and mentions of "disk" in the output of EXPLAIN ANALYZE would indicate the need for more work_mem.
Additional questions
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)?
More RAM almost always helps until the whole DB can be cached in RAM and all processes can afford all the work_mem they desire.
And what parameters should I change to say to Postgres to store db in ram?
Everything that's read from the database is cached automatically in system cache and Postgres cache, up to the limit of available RAM. (Setting work_mem too high competes for that same resource.)

Can Postgres table partitioning by hash boost query performance?

I have an OLTP table on a Postgres 14.2 database that looks something like this:
Column | Type | Nullable |
----------------+-----------------------------+-----------
id | character varying(32) | not null |
user_id | character varying(255) | not null |
slug | character varying(255) | not null |
created_at | timestamp without time zone | not null |
updated_at | timestamp without time zone | not null |
Indexes:
"playground_pkey" PRIMARY KEY, btree (id)
"playground_user_id_idx" btree (user_id)
The database host has 8GB of RAM and 2 CPUs.
I have roughly 500M records in the table which adds up to about 80GB in size.
The table gets about 10K INSERT/h, 30K SELECT/h, and 5K DELETE/h.
The main query run against the table is:
SELECT * FROM playground WHERE user_id = '12345678' and slug = 'some-slug' limit 1;
Users have anywhere between 1 record to a few hundred records.
Thanks to the index on the user_id I generally get decent performance (double-digit milliseconds), but 5%-10% of the queries will take a few hundred milliseconds to maybe a second or two at worst.
My question is this: would partitioning the table by hash(user_id) help me boost lookup performance by taking advantage of partition pruning?
No, that wouldn't improve the speed of the query at all, since there is an index on that attribute. If anything, the increased planning time will slow down the query.
If you want to speed up that query as much as possible, create an index that supports both conditions:
CREATE INDEX ON playground (user_id, slug);
If slug is large, it may be preferable to index a hash:
CREATE INDEX ON playground (user_id, hashtext(slug));
and query like this:
SELECT *
FROM playground
WHERE user_id = '12345678'
AND slug = 'some-slug'
AND hashtext(slug) = hashtext('some-slug');
LIMIT 1;
Of course, partitioning could be a good idea for other reasons, for example to speed up autovacuum or CREATE INDEX.

Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql

This is my table schema
Column | Type | Modifiers
-------------+------------------------+------------------------------------------------------
id | integer | not null default nextval('message_id_seq'::regclass)
date_created | bigint |
content | text |
user_name | character varying(128) |
user_id | character varying(128) |
user_type | character varying(8) |
user_ip | character varying(128) |
user_avatar | character varying(128) |
chatbox_id | integer | not null
Indexes:
"message_pkey" PRIMARY KEY, btree (id)
"idx_message_chatbox_id" btree (chatbox_id)
"indx_date_created" btree (date_created)
Foreign-key constraints:
"message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE
This is the query
SELECT *
FROM message
WHERE chatbox_id=$1
ORDER BY date_created
OFFSET 0
LIMIT 20;
($1 will be replaced by the actual ID)
It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records
I am using Postresql Server 9.1.5 with default options.
Update the output of EXPLAIN ANALYZE
Limit (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
-> Index Scan Backward using indx_date_created on message (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)
Update server specification
Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB
Final solution
Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested
You could try to give PostgreSQL a better index for that query. I propose something like this:
create index invent_suitable_name on message(chatbox_id, date_created);
or
create index invent_suitable_name on message(chatbox_id, date_created desc);
Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.
For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).
UPD My guess for the reason of bad performance:
At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:
Index scan on chatbox_id;
Ordering of returned records to get top 20.
Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.
UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?

How do I know if the statistics of a Postgres table are up to date?

In pgAdmin, whenever a table's statistics are out-of-date, it prompts:
Running VACUUM recommended
The estimated rowcount on the table schema.table deviates
significantly from the actual rowcount. You should run VACUUM ANALYZE
on this table.
I've tested it using pgAdmin 3 and Postgres 8.4.4, with autovacuum=off. The prompt shows up immediately whenever I click a table that has been changed.
Let's say I'm making a web-based system in Java, how do I detect if a table is out-of-date, so that I can show a prompt like the one in pgAdmin?
Because of the nature of my application, here are a few rules I have to follow:
I want to know if the statistics of a certain table in pg_stats and pg_statistic are up to date.
I can't set the autovacuum flag in postgresql.conf. (In other words, the autovacuum flag can be on or off. I have no control over it. I need to tell if the stats are up-to-date whether the autovacuum flag is on or off.)
I can't run vacuum/analyze every time to make it up-to-date.
When a user selects a table, I need to show the prompt that the table is outdated when there are any updates to this table (such as drop, insert, and update) that are not reflected in pg_stats and pg_statistic.
It seems that it's not feasible by analyzing timestamps in pg_catalog.pg_stat_all_tables. Of course, if a table hasn't been analyzed before, I can check if it has a timestamp in last_analyze to find out whether the table is up-to-date. Using this method, however, I can't detect if the table is up-to-date when there's already a timestamp. In other words, no matter how many rows I add to the table, its last_analyze timestamp in pg_stat_all_tables is always for the first analyze (assuming the autovacuum flag is off). Therefore, I can only show the "Running VACUUM recommended" prompt for the first time.
It's also not feasible by comparing the last_analyze timestamp to the current timestamp. There might not be any updates to the table for days. And there might be tons of updates in one hour.
Given this scenario, how can I always tell if the statistics of a table are up-to-date?
Check the system catalogs.
=> SELECT schemaname, relname, last_autoanalyze, last_analyze FROM pg_stat_all_tables WHERE relname = 'accounts';
schemaname | relname | last_autoanalyze | last_analyze
------------+----------+-------------------------------+--------------
public | accounts | 2022-11-22 07:49:16.215009+00 |
(1 row)
=>
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW
All kinds of useful information in there:
test=# \d pg_stat_all_tables View "pg_catalog.pg_stat_all_tables"
Column | Type | Modifiers
-------------------+--------------------------+-----------
relid | oid |
schemaname | name |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | bigint |
idx_tup_fetch | bigint |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |
n_tup_hot_upd | bigint |
n_live_tup | bigint |
n_dead_tup | bigint |
last_vacuum | timestamp with time zone |
last_autovacuum | timestamp with time zone |
last_analyze | timestamp with time zone |
last_autoanalyze | timestamp with time zone |
vacuum_count | bigint |
autovacuum_count | bigint |
analyze_count | bigint |
autoanalyze_count | bigint |
You should not have to worry about vac'ing in your application. Instead, you should have the autovac process configured on your server (in postgresql.conf), and the server takes takes of VACCUM and ANALYZE processes based on its own internal statistics. You can configure how often it should run, and what the threshold variables are for it to process.

Oracle: TABLE ACCESS FULL with Primary key?

There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?
select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.
To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"