Does using 'create index concurrently' in postgres, help future rows which will be inserted to be free from locks? - postgresql

Does the create index concurrently works only when the table is created for the first time or does it even work for the records which will be inserted in future?

You are misunderstanding what concurrently does: it avoids locking the table for write access while the index is created.
Once the index is created, there is no difference between an index created with the concurrently option and one without. If new rows get inserted, the index is updated with the new values. Inserting new rows, does not "rebuild" the entire index.
Once an index is created, inserting rows into the table does not lock the table at all, regardless of how the index was created. Non-unique indexes always allow concurrent insert to the table.
A unique index however will block concurrent inserts for the same value(s) - but not for different values.

Related

Insert query having DataFileRead wait event

There is an insert query inserting data into a partitioned table using values clause.
insert into t (c1, c2, c3) values (v1,v2,v3);
Database is AWS Aurora v11. Around 20 sessions run in parallel, executing ~2million individual insert statements in total. Seeing DataFileRead as the wait event, wondering why would this wait event show up for an insert statement? Would it be because each insert statement has to check if the PK/UK keys already exists in the table before committing the insert statement? Or other reasons?
Each inserted row has to read the relevant leaf pages of each of the table's indexes in order to do index maintenance (insert the index entries for the new row into their proper locations--it has to dirty the page, but it first needs to read the page before it can dirty it), and also to verify PK/UK constraints. And maybe it also needs to read index leaf pages of other table's indexes in order to verify FKs.
If you insert the new tuples is the right order, you an hit the same leaf pages over and over in quick sequence, maximizing the cacheability. But if you have multiple indexes, there might be no ordering that can satisfy all of them.

Future index creation on partition tables postgres [duplicate]

I am using postgresql 14.1, and I re-created my live database using parititons for some tables.
since i did that, i could create index when the server wasn't live, but when it's live i can only create the using concurrently but unfortunately when I try to create an index concurrently i get an error.
running this:
create index concurrently foo on foo_table(col1,col2,col3));
provides the error:
ERROR: cannot create index on partitioned table "foo_table" concurrently
now it's a live server and i cannot create indexes not concurrently and i need to create some indexes in order to improve performance. any ideas how do to that ?
thanks
No problem. First, use CREATE INDEX CONCURRENTLY to create the index on each partition. Then use CREATE INDEX to create the index on the partitioned table. That will be fast, and the indexes on the partitions will become the partitions of the index.
Step 1: Create an index on the partitioned (parent) table
CREATE INDEX foo_idx ON ONLY foo (col1, col2, col3);
This step creates an invalid index. That way, none of the table partitions will get the index applied automatically.
Step 2: Create the index for each partition using CONCURRENTLY and attach to the parent index
CREATE INDEX CONCURRENTLY foo_idx_1
ON foo_1 (col1, col2, col3);
ALTER INDEX foo_idx
ATTACH PARTITION foo_idx_1;
Repeat this step for every partition index.
Step 3: Verify that the parent index created at the beginning (Step 1) is valid. Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically.
SELECT * FROM pg_index WHERE pg_index.indisvalid = false;
The query should return zero results. If thats not the case then check your script for mistakes.

incremental update of a postgres index

I have inserted a lot of data (more than 2 millions documents) in a table and created an a full text search index using GIN and it works great. I can query the database and retrieve the apropriate documents rapidly.
Regularly, I collect new data that I can insert in the database. What I would like to do is to update my index with the new data only, but I have failed so far. I don't want to drop the index and recreate it because it takes ages to recreate it. I basically would like to do an incremental update of the index. I can do that on the fly when data is being inserted but this is very very slow. I read that creating an index on inserted data was faster (true) so I guessed that updating an index on the new data could be done. But I can't do it so far.
I use postgresql 12.
Can anybody help me, please?
There is no way to suspend adding values to the index while you load data.
But GIN indexes already have a feature to optimize that: the GIN fast update technique.
If you set the gin_pending_list_limit storage parameter to the index to a high value. Once you are done with the bulk load, VACUUM the table to integrate the pending list into the main index.
An alternative approach is to use partitioning and load a partition at once. Then create the index on the partition and attach it to the partitioned table.

Bulk update Postgres table

I have a table with around 200 million records and I have added 2 new columns to it. Now the 2 columns need values from a different table. Nearly 80% of the rows will be updated.
I tried update but it takes more than 2 hours to complete.
The main table has a composite primary key of 4 columns. I have dropped it and dropped an index that is present on a column before updating. Now the update takes little over than 1 hour.
Is there any other way to speed up this update process (like batch processing).
Edit: I used the other table(from where values will be matched for update) in from clause of the update statement.
Not really. Make sure that max_wal_size is high enough that you don't get too many checkpoints.
After the update, the table will be bloated to about twice its original size.
That bloat can be avoided if you update in batches and VACUUM in between, but that will not make processing faster.
Do you need whole update in single transaction? I had quite similar problem, with table that was under heavy load, and column required not null constraint. Do deal with it - I did some steps:
Add columns without constraints like not null, but with defaults. That way it went really fast.
Update columns in steps like 1000 entries per transaction. In my case load of the DB rise, so I had to put small delay.
Update columns to have not null constraints.
That way you don't block table for long time, but that is not an answer to your question.
First to validate where you are - I would check iostats to see if that is not the limit... To speed up, I would consider:
higher free space map - to be sure DB is aware of entries that can be removed, but note that if pages are packed to the limit it would not bring much...
maybe foreign keys referring to the table can be also removed? To stop locking the table,
removing all indices since they are slowing down, and create them afterwords - that looks like slicing problem but other way, but is an option, so counts...
There is a 2 type of solution to your problem.
1) This approach work if your main table doesn't update or inserted during this process
First create the same table schema without composite primary key and index with a different name.
Then insert the data in the new table with join table data.
Apply all constraints and indexes on the new table after insert.
Drop the old table and rename the new table with the old table name.
2) Or you can use a trigger to update that two-column on insert or update event. (This will make insert update operation slightly slow)

PostgreSQL - reindex when add new index

I have a table with 100k records without indexes. I created a new index on column that is used for left join.
Do I need to reindex my table?
Creation of an index took a few ms. So I am guessing that query can not use this index (no data) until I reindex my table (in case I would have other indexes I would reindex only index - I read the manual).
I can't find any information when new index is populated with data? Is this done automatically? When?
Once CREATE INDEX has finished, the index is ready to be used. There is no need to run REINDEX after that.
From the REINDEX documentation page:
REINDEX is similar to a drop and recreate of the index in that the index contents are rebuilt from scratch. However, the locking considerations are rather different. REINDEX locks out writes but not reads of the index's parent table.
That means REINDEX behaves similar to CREATE after DROP.
And from the CREATE INDEX documentation page:
Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished.
I think this unambiguously explains that creation implies indexation.
Whether or not a specific query uses the index depends on many different things though. If your query doesn't use the index, you need to post the query, the table definitions (e.g. as a create table statement), the index you have defined and the output of explain (analyze, verbose) of your query.