How to find out fragmented indexes and defragment them in PostgreSQL? - postgresql

I've found how we can solve this problem in SQL Server here - but how can i do it in PostgreSQL?

Normally you don't have to worry about that at all.
However, if there has been a mass delete or update, or the sustained change rate was so high that autovacuum couldn't keep up, you may end up with a badly bloated index.
The tool to determine that id the pgstattuple extension:
Then you can examine index bloat like this:
SELECT * FROM pgstatindex('spatial_ref_sys_pkey');
-[ RECORD 1 ]------+-------
version | 2
tree_level | 1
index_size | 196608
root_block_no | 3
internal_pages | 1
leaf_pages | 22
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 64.48
leaf_fragmentation | 13.64
This index is in excellent shape (never used): It has only 14% bloat.
Mind that indexes are by default created with a fillfactor of 90, that is, index blocks are not filled to more than 90% by INSERT.
It is hard to say when an index is bloated, but if leaf_fragmentation exceeds 50-60, it's not so pretty.
To reorganize an index, use REINDEX.

With PostgreSQL index defragmentation should generally be handled automatically by the Autovacuum daemon. If you don't use the autovacuum daemon, or if it isn't able to keep up, you can always reindex problematic indexes.
Determining which indexes may be badly fragmented isn't particularly straight forward and it's discussed at length in this blog post and in this PostgreSQL wiki article.


How to make big postgres db faster?

I have big Postgres database(around 75 GB) and queries are very slow. Is there any way to make them faster?
About database:
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
public | fingerprints | table | postgres | permanent | heap | 35 GB |
public | songs | table | postgres | permanent | heap | 26 MB |
public | songs_song_id_seq | sequence | postgres | permanent | | 8192 bytes |
\d+ fingerprints
Table "public.fingerprints"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
hash | bytea | | not null | | extended | | |
song_id | integer | | not null | | plain | | |
offset | integer | | not null | | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
"ix_fingerprints_hash" hash (hash)
"uq_fingerprints" UNIQUE CONSTRAINT, btree (song_id, "offset", hash)
Foreign-key constraints:
"fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
DB Scheme
DB Amount
No need to write to database, only read. All queries are very simple:
SELECT song_id
WHERE hash in fingerpints = X
EXPLAIN(analyze, buffers, format text) SELECT "song_id", "offset" FROM "fingerprints" WHERE "hash" = decode('eeafdd7ce9130f9697','hex');
Index Scan using ix_fingerprints_hash on fingerprints (cost=0.00..288.28 rows=256 width=8) (actual time=0.553..234.257 rows=871 loops=1)
Index Cond: (hash = '\xeeafdd7ce9130f9697'::bytea)
Buffers: shared hit=118 read=749
Planning Time: 0.225 ms
Execution Time: 234.463 ms
(5 rows)
234 ms looks fine where it is one query. But in reality there 3000 query per time, that takes about 600 seconds. It is audio recognition application, so algoritm works like that.
About indexes:
CREATE INDEX "ix_fingerprints_hash" ON "fingerprints" USING hash ("hash");
For pooler I use Odyssey.
Little bit of info from config:
shared_buffers = 4GB
huge_pages = try
work_mem = 582kB
maintenance_work_mem = 2GB
effective_io_concurrency = 200
max_worker_processes = 24
max_parallel_workers_per_gather = 12
max_parallel_maintenance_workers = 4
max_parallel_workers = 24
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 16GB
min_wal_size = 4GB
random_page_cost = 1.1
effective_cache_size = 12GB
Info about hardware:
Xeon 12 core (24 threads)
NVME disk
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)? And what parameters should I change to say to Postgres to store db in ram?
I read about several topics about pg_tune, etc. but experiments don't show any good results.
Increasing the RAM so that everything can stay in cache (perhaps after using pg_prewarm to get it into cache in the first place) would certainly work. But it is expensive and shouldn't be necessary.
Having a hash index on something which is already a hashed value is probably not very helpful. Have you tried just a default (btree) index instead?
If you CLUSTER the table on the index over the column named "hash" (which you can only do if it is a btree index) then rows with the same hash code should mostly share the same table page, which would greatly cut down on the number of different buffer reads needed to fetch them all.
If you could get it do a bitmap heap scan instead of an index scan, then it should be able to have a large number of read requests outstanding at a time, due to effective_io_concurrency. But the planner does not account for effective_io_concurrency when doing planning, which means it won't choose a bitmap heap scan specifically to get it that benefit. Normally an index read finding hundreds of rows on different pages would automatically choose a bitmap heap scan method, but in your case it is probably the low setting of random_page_cost which is inhibiting it from doing so. The low setting of random_page_cost is probably reasonable in itself, but it does have this unfortunate side effect. A problem with this strategy is that it doesn't reduce the overall amount of IO needed, it just allows them overlap and so make better use of multiple IO channels. But if many sessions are running many instances of this query at the same time, they will start filling up those channels and so start competing with each other. So the CLUSTER method is probably superior as it gets the same answer with less IO. If you want to play around with bitmap scans, you could temporarily increase random_page_cost or temporarily set enable_indexscan to off.
No need to write to database, only read.
So the DB is read-only.
And in comments:
db worked fine on small amount of data(few GB), but after i filled out database started to slowdown.
So indexes have been built up incrementally.
UNIQUE CONSTRAINT on (song_id, "offset", hash)
I would replace that with:
ALTER TABLE fingerprints
DROP CONSTRAINT uq_fingerprints
, ADD CONSTRAINT uq_fingerprints UNIQUE(hash, song_id, "offset") WITH (FILLFACTOR = 100)
This enforces the same constraint, but the leading hash column in the underlying B-tree index now supports the filter on hash in your displayed query. And the fact that all needed columns are included in the index, further allows much faster index-only scans. The (smaller) index should also be more easily cached than the (bigger) table (plus index).
Is a composite index also good for queries on the first field?
Also rewrites the index in pristine condition, and with FILLFACTOR 100 for the read-only DB. (Instead of the default 90 for a B-tree index.)
Hash index on (hash) and CLUSTER
The name of the column "hash" has nothing to do with the name of the index type, which also happens to be "hash". (The column should probably not be named "hash" to begin with.)
If (and only if) you also have other queries centered around one of few hash values, that cannot use index-only scans (and you actually see faster queries than without) keep the hash index, additionally. And optimize it. (Else drop it!)
ALTER INDEX ix_fingerprints_hash SET (FILLFACTOR = 100);
An incrementally grown index may end up with bloat or unbalanced overflow pages in case of a hash index. REINDEX should take care of that. While being at it, increase FILLFACTER to 100 (from the default 75 for a hash index) for your read-only (!) DB. You can REINDEX to make the change effective.
REINDEX INDEX ix_fingerprints_hash;
Or you can CLUSTER (like jjanes already suggested) on the rearranged B-tree index from above:
CLUSTER fingerprints USING uq_fingerprints;
Rewrites the table and all indexes; rows are physically sorted according to the given index, so "clustered" around the leading column(s). Effects are permanent for your read-only DB. But index-only scans do not benefit from this.
When done optimizing, run once:
VACUUM ANALYZE fingerprints;
The tiny setting for work_mem stands out:
work_mem = 582kB
Even the (very conservative!) default is 4MB.
But after reading your question again, it would seem you only have tiny queries. So maybe that's ok after all.
Else, with 16GB RAM you can typically afford a 100 times as much. Depends on your work load of course.
Many small queries, many parallel workers --> keep small work_mem (like 4MB?)
Few big queries, few parallel workers --> go high (like 256MB? or more)
Large amounts of temporary files written in your database over time, and mentions of "disk" in the output of EXPLAIN ANALYZE would indicate the need for more work_mem.
Additional questions
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)?
More RAM almost always helps until the whole DB can be cached in RAM and all processes can afford all the work_mem they desire.
And what parameters should I change to say to Postgres to store db in ram?
Everything that's read from the database is cached automatically in system cache and Postgres cache, up to the limit of available RAM. (Setting work_mem too high competes for that same resource.)

PostgreSQL why table bloat ratio is higher than autovacuum_vacuum_scale_factor

I found that bloat ratio of feedback_entity is 48%
current_database | schemaname | tblname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio | is_na
stackdb | public | feedback_entity | 5743878144 | 2785599488 | 48.4968416488746 | 100 | 2785599488 | 48.4968416488746 | f
but when I check autovacuum setting it has autovaccum setting of 10%
stackdb=> show autovacuum_vacuum_scale_factor;
(1 row)
stackdb=> show autovacuum_vacuum_threshold;
(1 row)
Autovacuum setting is on.
Autovacuum for mentioned table are running regularly at defined threshold
My Question is when auto vacuum is running at 10% of dead tuples why would bloat size increase to 48%. I have seen similar behaviour in hundreds of databases/tables. Why table bloat is always increasing and doesn't come down after every vacuum.
The query that you used to calculate the table bloat is unreliable. To determine the actual bloat, use the pgstattuple extension and query like this:
SELECT * FROM pgstattuple('public.feedback_entity');
But the table may really be bloated. There are two major reasons for that:
autovacuum runs and finishes in a reasonable time, but it cannot clean up the dead tuples. That may be because there is a long-running open transaction, an abandoned replication slot or a prepared transaction. See this article for details.
autovacuum runs too slow, so that dead rows are generated faster than it can clean them up. The symptoms are lots of dead tuples in pg_stat_user_tables and autovacuum processes that keep running forever. The straightforward solution is to use ALTER TABLE to increase autovacuum_vacuum_cost_limit or reduce autovacuum_vacuum_cost_delay for the afflicted table. An alternative approach, if possible, is to use HOT updates.

PostgreSQL index bloat ratio more than table bloat ratio and autovacuum_vacuum_scale_factor

Index bloats are reaching 57%, while table bloat is 9% only and autovacuum_vacuum_Scale_factor is 10% only.
what is more surprising is even primary key is having bloat of 57%. My understanding is since my primary key is auto incrementing and single column key only so after 10% of table dead tuples, primary key index should also have 10% dead tuples.
Now when autovacuum will run at 10% of dead tuples , it will clean dead tuples. The dead tuple space now becomes bloat and this should be reused by new updates, insert. But this isn't happening in my database, here bloat size keeps on increasing.
Index Bloat:
current_database | schemaname | tblname | idxname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio
| is_na
stackdb | public | data_entity | data_entity_pkey | 2766848000 | 1704222720 | 61.5943745373797 | 90 | 1585192960 | 57.2923760177646
Table Bloat:
current_database | schemaname | tblname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio | is_na
stackdb | public | data_entity | 10106732544 | 1007288320 | 9.96650812332014 | 100 | 1007288320 | 9.96650812332014 | f
Autovacuum Settings:
stackdb=> show autovacuum_vacuum_scale_factor;
(1 row)
stackdb=> show autovacuum_vacuum_threshold;
(1 row)
autovacuum is on
autovacuum is running successfully at defined intervals.
postgreSQL is running version 10.6. Same issue has been found with version 12.x
First: an index bloat of 57% is totally healthy. Don't worry.
Indexes become more bloated than tables, because the empty space cannot be reused as freely as it can be in a table. The table, also known al “heap”, has no predetermined ordering: if a new row is written as the result of an INSERT or UPDATE, it ends up in the first page that has enough free space, so it is easy to keep bloat low if VACUUM does its job.
B-tree indexes are different: their entries have a certain ordering, so the database is not free to choose where to put the new row. So you may have to put it into a page that is already full, causing a page split, while elsewhere in the index there are pages that are almost empty.

Why does Postgres VACUUM FULL ANALYZE gives performance boost but VACUUM ANALYZE does not

I have a large database with the largest tables having more than 30 million records. The database server is a dedicated server with 64 cores, 128 GB RAM running ubuntu and postgres 12. So the server is more powerful than we normally need. The server receives around 300-400 new records every second.
The problem is that almost after 1 week or 10 days of use the database becomes extremely slow, therefore we have to perform VACUUM FULL ANALYZE, and after this everything goes back to normal. But we have to put our server in maintenance mode and then perform this operation every week which is a pain.
I came up with the idea that we don't need a VACUUM FULL and we can just run ANALYZE on the database as it can run in parallel, but this didn't work. There was no performance gains after running this. Even when i run simple VACUUM on the whole database and then run ANALYZE after it, it still doesn't give the kind of performance boost that we get from VACUUM FULL ANALYZE.
I know that VACUUM FULL copies the data from the old table to a new tables and deletes the old table. But what else does it do?
So i have also reindexed the 15 largest tables, in order to confirm if this would speed up the database. But this also didnt work.
So i had to execute VACUUM FULL ANALYZE, as i didnt see any other way. Now i am trying to identify the slow queries.
Thanks to jjanes, i was able to install Track_io_timing and also identified a few queries where indexes can be added. I am using like this
SELECT * FROM pg_stat_statements ORDER BY total_time DESC;
And i get this result.
userid | 10
dbid | 16401
queryid | -3264485807545194012
query | update events set field1 = $1, field2 = $2 , field3= $3, field4 = $4 , field5 =$5 where id = $6
calls | 104559
total_time | 106180828.60536088
min_time | 3.326082
max_time | 259055.09376800002
mean_time | 1015.5111334783633
stddev_time | 1665.0715182035976
rows | 104559
shared_blks_hit | 4456728574
shared_blks_read | 4838722113
shared_blks_dirtied | 879809
shared_blks_written | 326809
local_blks_hit | 0
local_blks_read | 0
local_blks_dirtied | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0
blk_read_time | 15074237.05887792
blk_write_time | 15691.634870000113
This query simply updates 1 record, and the table size is around 30 Million records.
Question: This query already uses an index, can you please guide on what should be the next step and why is this slow? Also IO information does this show?
As you say, VACUUM FULL is an expensive command. PGs secret weapon is AUTOVACUUM, which monitors database stats and attempts to target tables with dead tuples. Read about how to tune it for the database as a whole, and possibly for big tables.

optimize a postgres query that updates a big table [duplicate]

I have two huge tables:
Table "public.tx_input1_new" (100,000,000 rows)
Column | Type | Modifiers
blk_hash | character varying(500) |
blk_time | timestamp without time zone |
tx_hash | character varying(500) |
input_tx_hash | character varying(100) |
input_tx_index | smallint |
input_addr | character varying(500) |
input_val | numeric |
"tx_input1_new_h" btree (input_tx_hash, input_tx_index)
Table "public.tx_output1_new" (100,000,000 rows)
Column | Type | Modifiers
tx_hash | character varying(100) |
output_addr | character varying(500) |
output_index | smallint |
input_val | numeric |
"tx_output1_new_h" btree (tx_hash, output_index)
I want to update table1 by the other table:
UPDATE tx_input1 as i
input_addr = o.output_addr,
input_val = o.output_val
FROM tx_output1 as o
i.input_tx_hash = o.tx_hash
AND i.input_tx_index = o.output_index;
Before I execute this SQL command, I already created the index for this two table:
CREATE INDEX tx_input1_new_h ON tx_input1_new (input_tx_hash, input_tx_index);
CREATE INDEX tx_output1_new_h ON tx_output1_new (tx_hash, output_index);
I use EXPLAIN command to see the query plan, but it didn't use the index I created.
It took about 14-15 hours to complete this UPDATE.
What is the problem within it?
How can I shorten the execution time, or tune my database/table?
Thank you.
Since you are joining two large tables and there are no conditions that could filter out rows, the only efficient join strategy will be a hash join, and no index can help with that.
First there will be a sequential scan of one of the tables, from which a hash structure is built, then there will be a sequential scan over the other table, and the hash will be probed for each row found. How could any index help with that?
You can expect such an operation to take a long time, but there are some ways in which you could speed up the operation:
Remove all indexes and constraints on tx_input1 before you begin. Your query is one of the examples where an index does not help at all, but actually hurts performance, because the indexes have to be updated along with the table. Recreate the indexes and constraints after you are done with the UPDATE. Depending on the number of indexes on the table, you can expect a decent to massive performance gain.
Increase the work_mem parameter for this one operation with the SET command as high as you can. The more memory the hash operation can use, the faster it will be. With a table that big you'll probably still end up having temporary files, but you can still expect a decent performance gain.
Increase checkpoint_segments (or max_wal_size from version 9.6 on) to a high value so that there are fewer checkpoints during the UPDATE operation.
Make sure that the table statistics on both tables are accurate, so that PostgreSQL can come up with a good estimate for the number of hash buckets to create.
After the UPDATE, if it affects a big number of rows, you might consider to run VACUUM (FULL) on tx_input1 to get rid of the resulting table bloat. This will lock the table for a longer time, so do it during a maintenance window. It will reduce the size of the table and as a consequence speed up sequential scans.