PostgreSQL slow update with index - postgresql

Very simply update to reset 1 column in a table with approx 5mil rows as:
UPDATE t_Daily
SET Price= NULL
Price is not part of any of the indexes on that table.
Running this without indexes takes 45s.
Running this with one or more indexes takes at least 20 mins (I keep having to stop it).
I fully understand why maintaining indexes affects the performance of insert and update statements, but this update makes no changes to the table indexes so why does it have this terrible effect on performance?
Any ideas much appreciated.

That is normal and expected: updating an index can be about ten times as expensive as updating the table itself. The table has no ordering!
If price is not indexed, you can use HOT updates that avoid updating the indexes. To make use of that, the table has to be defined with a fillfactor under 100 so that updated rows can find room in the same block as the original rows.

Found some further info (thanks to Laurenz-Albe for the HOT tip).
This link https://malisper.me/postgres-heap-only-tuples/ states that
Due to MVCC, an update in Postgres consists of finding the row being updated, and inserting a new version of the row back into the database. The main downside to doing this is the need to readd the row to every index
So it is re-writing the index despite only updating a column not in the index.

Related

Overhead/downtime of partially indexing a column when no records meet the condition

I am needing to partially index a column when a single condition is met for a column (ex. some_column = 'some_value'). I am worried about the customer impact of triggering this new partial index and locking the table and am wondering how long that will take. In the databases where I am worried about the impact, there will be no records that meet the condition. Does this mean the overhead and time the table is locked would be drastically less than if there were records to index at the time of the index creation? The column in the where condition is indexed.
It will not use the index on the column in the WHERE to speed up creation of the empty partial index. It will still scan the full table, at however long it takes to do that. Not needing to sort any tuples or generate any index leaf blocks will speed it up, but probably not 'drastically'.
If you are afraid it will hold the lock too long, you can create the index CONCURRENTLY. This will take longer to do, but will hold a weaker lock while it does it. It will still need a strong lock at the beginning and at the end, but it will only be held momentarily.

How PostgreSQL index deals with MVCC

In PostgreSQL every update of tuple creates new tuple version. So for some period of time there can be lot of versions of same tuple and different transactions can see different version of tuple (using visibility rules)
Index is updated before transaction complete. How this works with SI?
So when one transaction updated tuple then index entry updated to point to new version of tuple?
Since PostgreSQL implements MVCC by keeping multiple versions of one row in the table at the same time, it also keeps multiple index entries for different versions of a single row (sometimes this can be avoided with heap-only tuples if the indexed entries are not modified during an update and the updated row is in the same table block as the original version).
The visibility information is not stored in the index, so to find the correct row version during an index scan, the table entries for all these index entries have to be checked (somethimes this can be avoided if an index block is known to contain only entries that are visible to everybody; this is an index-only scan).
Old index entries are removed along with old table entries during autovacuum.

How to improve insert speed with index on text column

I am using Postgresql database for our project and doing some performance testing. We need to insert millions of record with indexed columns. We have 5 columns in table. I created index on integer only then performance is good but when I created index on text column as well then the performance reduced to 1/8th times. My question is how I can improve performance when inserting data using index on text column?
Short answer is you can't.
It is well known that adding indexes on db columns is like a 2 edged sword:
on one (positive) side it adds improved speed to you read queries
on the other, it adds performance penalty to insert/update/delete operations and your data will occupy a little more disk space
A possible solution would be to use some full text search engines like Sphinx which will index your text entities in your DB

I have a massive table that I need to optimize. I think I need to use indexes, but I was hoping for some more information about them

So I have a large table that I query (select only) quite frequently. The table is around 12,000 rows long. Since the advent of iOS, the time that it is taking to run these select queries has gone up 4-5x.
I was told that I need to add an index to my table. The query that I am using looks like this:
SELECT * FROM book_content WHERE book_id = ? AND chapter = ? ORDER BY verse ASC
How can I create an index for this table? Is it a command I just run once? What exactly is the index going to do? I didn't learn about these in school so they still seem like some sort of magic to me at this point, so I was hoping to get a little instruction.
Thanks!
You want an index on book_id and chapter. Without an index, a server would do a table scan and essentially load the entire table into memory to do its search. Do a quick search on the CREATE INDEX command for the RDBMS that you are using. You create the index once and every time you do an INSERT or DELETE or UPDATE, the server will update the index automatically. An index can be UNIQUE and it can be on multiple fields (in your case, book_id and chapter). If you make it UNIQUE, the database will not allow you to insert a second row with the same key (in this case, book_id and chapter). On most servers, having one index on two fields is different from having two individual indexes on single fields each.
A Mysql example would be:
CREATE INDEX id_chapter_idx ON book_content (book_id,chapter);
If you want only one record for each book_id, chapter combination, use this command:
CREATE UNIQUE INDEX id_chapter_idx ON book_content (book_id,chapter);
A PRIMARY INDEX is a special index that is UNIQUE and NOT NULL. Each table can only have one primary index. In fact, each table should have one primary index to ensure table integrity, especially during joins.
You don't have to think of indexes as "magic".
An index on an SQL table is much like the index in a printed book - it lets you find what you're looking for without reading the entire book cover-to-cover.
For example, say you have a cookbook, and you're looking for recipes that involve chicken. The index in the back of the book might say something like:
chicken: 30,34,72,84
letting you know that you will find chicken recipes on those 4 pages. It's much faster to find this information in the index than by reading through the whole book, because the index is shorter, and (more importantly) it's in alphabetical order, so you can quickly find the right place in the index.
So, in general you want to create indexes on columns that you will regularly need to query (book_id and chapter, in your example).
When you declare a column as primary key automatically generates an index on that column. In your case for using more often select an index is ideal, because they improve time of selection queries and degrade the time of insertion. So you can create the indexes you think you need without worrying about the performance
Indexes are a very sensitive subject. If you consider using them, you need to be very careful how many you make. The primary key, or id, of each table should have a clustered index. All the rest, it depends on how you plan to use them. I'm very fuzzy in the subject of indexes, and have actually never worked with them, but from a seminar I just watched actually yesterday, you don't want too many indexes - because they can actually slow things down when you don't need to use them.
Let's say you put an index on 5 out of 8 fields on a table. Each index is designated for a particular query somewhere in your software. Well, when 1 query is run, it uses that 1 index, and doesn't need the other 4. So that's unneeded weight on this 1 query. If you need an index, be sure that this is an index which could be useful in many places, not just 1 place.

Adding a new column to Table which contains live data

I have a large table consisting of over 60 millions records and I would like to add 2 new columns for data migration purposes. There are indexes on the table and some of them are large. So, by me adding the 2 new columns to the table, will I run the risk of slowing down the database whilst it attempts to add them and maybe time-out? Or will it just work?
I know that if I try and rearrange the columns SQL Server will ask me to drop and re-create the table, so I definately don't want this. Is this something everyone is challenged with?
We've had the same problem with column and index changes on larger tables.
I would simply add the columns using ALTER TABLE. The column order, though nice, is irrelevant.
If the columns are NULLable them time is reasonable. if you want to add a default value and make them NOT NULL, then this is more work obviously. However, I would consider adding as NOT NULL, then setting to a value, then changing to NOT NULL to make it 3 steps you can do at different times. We do this to reduce the time window we need, even if the whole process tales longer