Postgres index creation time - postgresql

Postgres is creating simple index more than 2 days. Where can I see a log, or something about status index creation? Before this situation I've created index couple times and it took about an hour or less. I have about 5.5m rows and execute next command:
CREATE INDEX collapse_url ON tablle (url, collapse)

There are two possibilities:
The CREATE INDEX statement is waiting for a lock.
You should be able to see that in the pg_stat_activity view.
If that is your problem, end all concurrent long running transactions (e.g. using pg_terminate_backend).
The CREATE INDEX statement is truly taking very long (unlikely with a few million rows).
In that case, you can speed up processing by increasing maintenance_work_mem before you create the index.
There have been relevant improvements in this area in recent versions:
PostgreSQL v11 introduced parallel index builds.
PostgreSQL v12 introduced the view pg_stat_progress_create_index to monitor the progress of CREATE INDEX.
PostgreSQL v12 also introduced CREATE INDEX CONCURRENTLY to avoid a long ACCESS EXCLUSIVE lock on the table.

Related

CREATE INDEX CONCURRENTLY is executed but the CONCURRENTLY option lost after creation?

In my Postgres 14.3 database on AWS RDS, I want to create an index without blocking other database operations. So I want to use the CONCURRENTLY option and I executed the following statement successfully.
CREATE INDEX CONCURRENTLY idx_test
ON public.ap_identifier USING btree (cluster_id);
But when checking the database with:
SELECT * FROM pg_indexes WHERE indexname = 'idx_test';
I only see: Index without CONCURRENTLY option
I am expecting the index is created with CONCURRENTLY option.
Is there any database switch to turn this feature on, or why does it seem to ignore CONCURRENTLY?
As has been commented, CONCURRENTLY is not a property of the index, but an instruction to create the index without blocking concurrent writes. The resulting index does not remember that option in any way. Read the chapter "Building Indexes Concurrently" in the manual.
Creating indexes on big tables can take a while. The system table
pg_stat_progress_create_index can be queried for progress reporting. While that is going on, CONCURRENTLY is still reflected in the command column.
To your consolation: once created, all indexes are "concurrent" anyway, in the sense that they are maintained automatically without blocking concurrent reads or writes (except for UNIQUE indexes that prevent duplicates.)

Concurrent Insertions in PostgreSQL

The documentation for migrating to PostgreSQL 12.6 says
Concurrent insertions could lead to a corrupt index with entries placed in the wrong pages. It's recommended to reindex any GiST index that's been subject to concurrent insertions.
I am hoping for some clarity on the definition of a "concurrent insertion." Does it simply refer to the case when two transactions are attempting to update the same table?

What is the difference between postgresql rebuild index and recreate index, which one is better?

I have a table which index size is too big (about 2G). When I restore the database to a VM, the size is only 200M so I need to rebuild/recreate the index and I will probably do this online.
What is the difference between re-building (reindex) and re-creating the index, and which one is better when I do it online? Particularly, which option allows querying the DB during the operation?
The REINDEX command requires an exclusive table lock, which means that it'll stall any accesses to the table until the command has completed. If you can afford that kind of maintenance window it's perfectly fine.
The alternative for online rebuilding is to create a new index using CREATE INDEX CONCURRENTLY, then drop the old one. This will take longer to complete, but allows access to the table while rebuilding the index.
Postgres 12 has added a REINDEX INDEX CONCURRENTLY command, which does what you want here. https://paquier.xyz/postgresql-2/postgres-12-reindex-concurrently/ https://www.depesz.com/2019/03/29/waiting-for-postgresql-12-reindex-concurrently/

PostgreSQL vacuuming a frequently updating jsonb field

I have postgres table with jsonb field. Field size is about 2-4kb per row. My application updates 100k rows per day 2000 times (changing 0.1-0.5% of data in field). Autovacuum is off, vacuum full runs every day at night.
Vacuum frees about 100-300gb every day and takes a long time to go causing application downtime.
The question is: can I solve this problem with jsonb field or I must split that field onto other simple tables?
If your concern is long down time then yes VACUUM FULL requires exclusive lock on the table being vacuumed for entire period of run.
I'll suggest you to try pg_repack extension or pg_squeeze extension - depending upon postgres version. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. These extensions are really easy to install and use in postgres. These extensions can reduce your downtime significantly and also will help to reduce runs of VACUUM FULL.

PostgreSQL concurrent index creation - invalid index

We have the requirement to contain index size without locking tables. I tried to use 'create index concurrently ..' but it was resulting in INVALID index being created on one of the systems.We tried to do
- drop index
- drop index concurrently
- reindex table
However intermittently they were also getting stuck. This makes the whole approach of creating indexes concurrently via script is vulnerable.
Any ideas how this can be made full proof without manual intervention? If not, what are other effective ways to contain index sizes on postgreSQL in automated fashion on large and busy tables.
To benefit with concurrent build of indexes in script, you need to add logic in next commands (as you can't put in transaction). Simply check if index is not INVALID in next line and if it is, abort script.
Also I would reverse actions: first you build new index concurrently. If success, drop old one.
https://www.postgresql.org/docs/current/static/sql-createindex.html:
If a problem arises while scanning the table, such as a deadlock or a
uniqueness violation in a unique index, the CREATE INDEX command will
fail but leave behind an "invalid" index. This index will be ignored
for querying purposes because it might be incomplete; however it will
still consume update overhead.
and
Another difference is that a regular CREATE INDEX command can be
performed within a transaction block, but CREATE INDEX CONCURRENTLY
cannot.