PostgreSQL - reindex when add new index

PostgreSQL - reindex when add new index - postgresql

I have a table with 100k records without indexes. I created a new index on column that is used for left join.
Do I need to reindex my table?
Creation of an index took a few ms. So I am guessing that query can not use this index (no data) until I reindex my table (in case I would have other indexes I would reindex only index - I read the manual).
I can't find any information when new index is populated with data? Is this done automatically? When?

Once CREATE INDEX has finished, the index is ready to be used. There is no need to run REINDEX after that.
From the REINDEX documentation page:
REINDEX is similar to a drop and recreate of the index in that the index contents are rebuilt from scratch. However, the locking considerations are rather different. REINDEX locks out writes but not reads of the index's parent table.
That means REINDEX behaves similar to CREATE after DROP.
And from the CREATE INDEX documentation page:
Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished.
I think this unambiguously explains that creation implies indexation.
Whether or not a specific query uses the index depends on many different things though. If your query doesn't use the index, you need to post the query, the table definitions (e.g. as a create table statement), the index you have defined and the output of explain (analyze, verbose) of your query.

Related

incremental update of a postgres index

I have inserted a lot of data (more than 2 millions documents) in a table and created an a full text search index using GIN and it works great. I can query the database and retrieve the apropriate documents rapidly.
Regularly, I collect new data that I can insert in the database. What I would like to do is to update my index with the new data only, but I have failed so far. I don't want to drop the index and recreate it because it takes ages to recreate it. I basically would like to do an incremental update of the index. I can do that on the fly when data is being inserted but this is very very slow. I read that creating an index on inserted data was faster (true) so I guessed that updating an index on the new data could be done. But I can't do it so far.
I use postgresql 12.
Can anybody help me, please?

There is no way to suspend adding values to the index while you load data.
But GIN indexes already have a feature to optimize that: the GIN fast update technique.
If you set the gin_pending_list_limit storage parameter to the index to a high value. Once you are done with the bulk load, VACUUM the table to integrate the pending list into the main index.
An alternative approach is to use partitioning and load a partition at once. Then create the index on the partition and attach it to the partitioned table.

Does using 'create index concurrently' in postgres, help future rows which will be inserted to be free from locks?

Does the create index concurrently works only when the table is created for the first time or does it even work for the records which will be inserted in future?

You are misunderstanding what concurrently does: it avoids locking the table for write access while the index is created.
Once the index is created, there is no difference between an index created with the concurrently option and one without. If new rows get inserted, the index is updated with the new values. Inserting new rows, does not "rebuild" the entire index.
Once an index is created, inserting rows into the table does not lock the table at all, regardless of how the index was created. Non-unique indexes always allow concurrent insert to the table.
A unique index however will block concurrent inserts for the same value(s) - but not for different values.

SQLite ANALYZE breaks indexes

I have a table that contains about 500K rows. The table has an index on the 'status' column. So I run the following explain command:
EXPLAIN QUERY PLAN SELECT * FROM my_table WHERE status = 'ACTIVE'
Results in a predictable 'explanation'...
SEARCH TABLE my_table USING INDEX IDX_my_table_status (status=?) (~10 rows)
After many additional rows are added to the table, I call 'ANALYZE'. Afterwards, queries seemed much slower so I re-ran my explain and now see the following:
SCAN TABLE my_table (~6033 rows)
First thing I notice is that BOTH the row estimates are WAY off. The biggest concern is the fact that the index seems to be skipped once ANALYZE is ran. I tried REINDEX - to no avail. The only way I can get the indexes back is to drop them, then re-create them. Has anyone seen this? Is this a bug? Any ideas what I am doing wrong? I have tried this on multiple datbases and I see the same results. This is on my PC, and on MAC and on the iPhone/iPad - all the same results.

When SQLite fetches rows from a table using an index, it has to read the index pages first, and then read all the table's pages that contain one or more matching records.
If there are many matching records, almost all the table's pages are likely to contain one, so going through the index would require reading more pages.
However, SQLite's query planner does not have information about the record sizes in the index or the table, so it's possible that its estimates are off.
The information collected by ANALYZE is stored in the sqlite_stat1 and maybe sqlite_stat3 tables.
Please show what the information about your table is.
If that information that not reflect the true distribution of your data, you can try to run ANALYZE again, or just delete that information from the sqlite_stat* tables.
You can force going through an index if you use ORDER BY on the indexed field.
(INDEXED BY is, as its documentation says, not intended for use in tuning the performance of a query.)
If you do not need to select all fields of the table, you can speed up specific queries by creating an index on those queries' fields so that you have a covering index.

It's not uncommon for a query execution plan to avoid using an existing index on a low-cardinality column like "status", which probably only has a few distinct values. It's often faster for the lookups to be performed by scanning the db table. (Some DBAs recommend never indexing low-cardinality columns.)
However, based on the wildly varying row counts in the explain plan, I'm guessing that SQLite's 'analyze' performs similarly to MySQL's 'analyze' when using the InnoDB storage engine. MySQL's 'analyze' does a random set of dives into the table data to determine row count, index cardinality, etc. Because of the random dives, the statistics may vary after each 'analyze' is run, and result in differing query execution plans. Low-cardinality columns are even more susceptible to incorrect stats, as, for example, the random dives may indicate that the majority of the rows in your table have an "active" status, making it more efficient to table scan rather than use the index. (I'm no SQLite expert, so someone please chime in if my hunch about the 'analyze' behavior is incorrect.)
You can try testing the use of the index in the query using "indexed by" (see http://www.sqlite.org/lang_indexedby.html), although forcing the use of indexes is usually a last resort. Different RDBMSs have different solutions to the low-cardinality problem, such as partitioning, using bitmap indexes, etc. I would recommend researching SQLite-specific solutions to querying/indexing on low-cardinality columns).

Is the speed of a PostgreSQL SELECT adversely affected by too many indexes on the table?

I have read that when having a lot of indexes on a database It can seriously hurt the performance but in the PostgreSQL doc I can't find anything about it.
I have a very big table with something like 100 columns and a billion rows and often I have to do a lot of searches in a lot of different fields.
Does the performance of the PostgreSQL table will drop if I add a lot of indexes (maybe 10 unique column indexes and 5 or 7 3 column indexes)?
EDIT: With performance drop I mean the performance in fetching rows (select), the database will be updated once a month so the update and insert time are not an issue.

The indexes are maintained when the content of the table has been modified (i.e. INSERT, UPDATE, DELETE)
The query planner of PostgreSQL can decide when to use an index and when it's not needed and a sequential scan is more optimal.
So having too many indexes will hurt the modifying performance, not the fetching.

The indexes will have to be updated at each insert and update involving those columns.

I have some charts about that on my site: http://use-the-index-luke.com/sql/dml
An index is pure redundancy. It contains only data that is also stored
in the table. During write operations, the database must keep those
redundancies consistent. Specifically, it means that insert, delete
and update not only affect the table but also the indexes that hold a
copy of the affected data.
The chapter titles suggest the impact that indexes can have:
Insert — cannot take direct benefit from indexes
Delete — uses indexes for the where clause
Update — does not affect all indexes of the table

What is the command for Index optimization and update statistics for Oracle 10g and 11g?

I am Loading large no of rows into a table from a csv data file . For every 10000 records I want to update the indexs on the table for optimization (update statistics ). Any body tell me what is the command i can use? Also what is SQL Server "UPDATE STATISTICS" equivalent in Oracle.is Update statistics means index optimization or gatehring statistics. I am using Oracle 10g and 11g. Thanks in advance.

Index optimization is a tricky question. You can COALESCE an index to eliminate adjacent empty blocks, and you can REBUILD an index to completely trash and recreate it. In my opinion, what you may wish to do for the period of your data load, is make the indexes UNUSABLE, then when you're done, REBUILD them.
ALTER INDEX my_table_idx01 DISABLE;
-- run loader process
ALTER INDEX my_table_idx01 REBUILD;
You only want to gather statistics once when you're done, and that's done with a call to DBMS_STATS, like so:
EXEC DBMS_STATS.GATHER_TABLE_STATS ('my_schema', 'my_table');

I would recommend taking a different approach. I would drop the index(es), load the data and then recreate the index. After enabling it Oracle will build a good index on the data you just loaded. Two things are accomplished here, the records will load faster and the index will be rebuilt with a properly balanced tree. (Note: Be careful here, if the table is a really big table, you may need to declare a temporary tablespace for it to work in.)
drop index my_index;
-- uber awesome loading process
create index my_index on my_table(my_col1, my_col2);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse