In Postgres I can run a query against pg_stat_user_indexes table to verify whether an index was ever scanned or not. I have quite a few indexes that have 0 scans and have a few with less than 5 scans. I am considering a possibility of removing those indexes but I want to know when they were last used. Is there a way to find out when the last index scan happened for an index?
No, you cannot find out when an index was last used. I recommend that you take the usage count now and again a month from now. Then see if the usage count has increased.
Don't hesitate to drop indexes that are rarely used, unless you are dealing with a data warehouse. Even if the occasional query can take longer, all data modifications on the table will become faster, which is a net win.
Related
I had a table of 200GB with an index of 49GB. Only insert and update operation happens to that table. I dropped the existing index and created a new one on the same columns. New index size is only 6GB. I am using postgres database
Can someone explain how index size got reduced from 50GB to 6GB?
The newly created index is essentially optimally packed sorted data. To put some more data somewhere in the middle, while still maintaining the optimal packed sorted data you'd have to rewrite half of the index with every insert on average.
This is not acceptable, so the database uses some complicated and clever format for indexes (based on a b-tree data structure) that allow for changing the order of index blocks without moving them on disk. But the consequence of this is that after inserting some data in the middle some of the index data blocks are not 100% packed. The space left can be used in the future but only if the values inserted match to the block with regards of ordering.
So, depending on your usage pattern, you can easily have index blocks only 10% packed on average.
This is compounded by the fact that when you update a row both old and new version have to be present in the index at the same time. And if you do a bulk update of the whole table then the index will have to expand to contain twice the number of rows, although briefly. But it will not shrink back as easily, as this requires basically a rewrite of it all.
The index size tend to grow first and then stabilize after some usage. But the stable size is often nowhere near the size of a newly created one.
You might want to tune the autovacuum to be more aggressive - so the not needed anymore space in table and indexes is recovered faster and therefore can be reused faster. This can make your index stabilize faster and smaller. Also try to avoid too big bulk updates or do a vacuum full tablename after a huge update.
I have a collection I am updating adding a new field.
The document looks like:
{"A": "P145", "B":"adf", "C":[{"df":"14", "color":"blue"},{"df":17}],
"_id":ObjectID(....), "Synonyms":{"Synonym1": "value1",
"Synonym2": ["value1", "value2"]}}
In the update I am adding new elements to C
I want to create a index on the field A and B. A and B are 20206 unique fields. The queries to the database will be based on these fields.
The "_id" is set by default.
I plan to do it with collection.ensure_index({"A":1, "B":1}, background=True)
How much time could it need? It will be faster than the system index based on "_id"?
The amount of time it takes to add the index would depend on your hardware, but with 20206 records a simple index as you describe shouldn't take very long for most hardware.
Queries fully covered by the index (i.e. where you specify A and B, or just A, but not just B - indexes cover from left to right so unless you include A in the select, the index can't be used) will be much faster to retrieve the results. Unless you are searching by _id, the default index on _id won't help you at all; queries on A and B will have to perform a full collection scan without your proposed index, which is orders of magnitude slower than an index scan.
Inserts will be slightly slower as the index will need to be updated too, but again with a relatively small number of total documents, this isn't likely be a large overhead.
The updates to change the C collection may well be faster if you are using A and B to identify which document to update, as they will benefit from the faster search, and the update should not be impacted once the data is found as the index should not need changing.
As the absolute performance will be specific to your hardware, if you're concerned about it the best thing to do is try it out on a copy of the data (on similar hardware) and measure whether the performance meets your needs. The output from explaining the query can be very informative in understanding how your indexes are impacting your query performance.
Well, the time taken to create the index totally depends on the hardware (system) you are using and the number of records. for ~20K records it should be quick and not take more time. max few seconds in worst case. Little off topic, but i see that you have given background true option, probably its not needed as these background options are used while create a very large data set.Please consider few things while creating index, not only for this question but in general.
when you create Index in foreground they block the operation and wouldn't allow the read operation and that the reason background true is used. http://docs.mongodb.org/v2.2/administration/indexes/
the good part with foreground index creation is that the indexes are more compact and better compare to background. hence it should be preferred.
The good news is over a long run, both background index creation and foreground delivers the same performance and does'nt matter which way the indexes were created. ... Happy Mongoing.. ;-)
-$
I am updating over ~8M rows in table. The column I am updating is client_id and table has composite index on user_id and client_id. Is it going to affect the indexing in some way ...?
Doing a large update will be slower with indexes, since the indexes will have to be updated also. It might be an issue, or not. Depends on many things.
After the update the indexes will be as they should, but REINDEX might be in order to make their space usage better. If this 8M rows is majority of the table, VACUUM FULL may be in order to reduce disk space usage, but if the table is heavily updated all the time, it might not be worth it.
So if you want, you can remove the index, update and recreate the index, but it is impossible to say if that would be faster than doing update with the index in place.
I have a fragmentation problem on my production database. One of my main data tables is about 6GB(3GB Indexes) (about 9M records) in size and has 94%(!) index fragmentation.
I know that reorganizing indexes will solve this problem BUT my database is on SQL Server 2008R2 Express which has 10GB database limit and my database is already 8GB in size.
I have read few blog posts about this issue but non gave answer to my situation.
My Question1 is:
How much size(% or in GB) increase can I expect after reorganizing indexes on that table?
Question2:
Will Drop Index -> Build same index take less space? Time is not a factor for me at the moment.
Extra question:
Any other suggestions for database fragmentation? I know only to avoid shrinking like a fire ;)
Having INDEX on key columns will improve joins and Filters by negating the need for a table scan. A well maintained index can drastically improve performance.
It is Right that GUID's makes poor choice for indexed columns but by no means does it mean that you should not create these indexes. Ideally a data type of INT or BIGINT would be advisable.
For me Adding NEWID() as a default has shown some improvement in counteracting index fragmentation but if all alternatives fail you may have to do index maintenance (Rebuild, reorganize) operations more often than for other indexes. Reorganize needs some working space but in your scenario as time is not a concern, I would disable index, shrink DB and create index.
Based on your experience, is there any practical limit on the number of indexes per one table in Postresql? In theory, there is not, as per the documentation, citation: "Maximum Indexes per Table Unlimited" But:
Is it that the more indexes you have the slower the queries? Does it make a difference if I have tens vs hundreds or even thousands indexes? I am asking after I've read the documentation on postgres' partial indexes which makes me think of some very creative solutions that, however, require a lot of indexes.
There is overhead in having a high number of indexes in a few different ways:
Space consumption, although this would be lower with partial indexes of course.
Query optimisation, through making the choice of optimiser plan potentialy more complex.
Table modification time, through the additional work in modifying indexes when a new row is inserted, or current row deleted or modified.
I tend by default to go heavy on indexing as:
Space is generally pretty cheap
Queries with bound variables only need to be optimised once
Rows generally have to be found much more often than they are modified, so it's generally more important to design the system for efficiently finding rows than it is for reducing overhead in making modifications to them.
The impact of missing a required index can be very high, even if the index is only required occasionally.
I've worked on an Oracle system with denormalised reporting tables having over 200 columns with 100 of them indexed, and it was not a problem. Partial indexes would have been nice, but Oracle does not support them directly (you use a rather inconvenient CASE hack).
So I'd go ahead and get creative, as long as you're aware of the pros and cons, and preferably you would also measure the impact that you're having on the system.